Giter VIP home page Giter VIP logo

crispr-licht's Introduction

DA0010 - Example analysis of CRISPR-LICHT real data


tested on Debian11, gpu114, also with SLURM20 last modified 8 July 2022

Get example CRISPR-LICHT lineage tracing reads

cd /nfs/scistore16/itgrp/bioinf/projects/DA0010/original-paper-data-analysis-jean-01Apr22
module load samtools/1.14
module load bcftools/1.14

mkdir -p snakemake2

Read 1 and index 1 reads

samtools fastq -@24 -0 snakemake2/SRR11868220-orig-R1.fastq \
--i1 snakemake2/SRR11868220-orig-I1.fastq \
--index-format i11 --quality-tag QT \
--barcode-tag BC  \
CBYRBANXX_1_20180406B_20180408.bam &

just index 2 reads

samtools fastq -@24 -0 /dev/null \
--i1 snakemake2/SRR11868220-orig-I2.fastq \
--index-format i9 \
--quality-tag Q2 \
--barcode-tag B2 CBYRBANXX_1_20180406B_20180408.bam &

gRNAs from Simon Haendeler that they used in the original experiment

The first letters such as ACTG1 are the gene name for the guide RNA The next letters are the level of evidence (compare fig2 B in https://science.sciencemag.org/content/370/6519/935) and is set to "nC" for neutral controls and "dC" for cell proliferation controls (mnemonic for "depletion/death controls") and "pC" for positive controls (which were not further analyzed). The last number such as for 1 for ACTG1_AJ_1 is guide RNA number for that gene. The captial letters such as GGAAGAAGAGATCGCCGCGC are the guide RNA sequence.

gRNA.txt Click to expand!
ACTG1_AJ_1	GGAAGAAGAGATCGCCGCGC
ACTG1_AJ_2	CCGTGTTTCCTTCCATCGTC
ACTG1_AJ_3	TGACGATGCCATGCTCAATG
ACTG1_AJ_4	AGAGTCCATGACAATGCCAG
AKT3_LOE3_1	TTACCATTGTGAAAGAAGGT
AKT3_LOE3_2	ACACCCGCTCTCTCGACAAA
AKT3_LOE3_3	TTGGTTCGAGAGAAGGCAAG
AKT3_LOE3_4	ATTTCATGTAGATACTCCAG
ALG12_LOE3_1	GCGAAAGCACGTAAACCGCG
ALG12_LOE3_2	GCGTGCAGTAGAACATCAGG
ALG12_LOE3_3	AGTCTTGGCTGTACAAAGCG
ALG12_LOE3_4	CAGCGAAAGCACGTAAACCG
ALG1_LOE2_1	GCATGGCCAACGACAGCGCG
ALG1_LOE2_2	GCGCGTGGTACTGCATACGG
ALG1_LOE2_3	ACTCCGTACTGGAAAACTCG
ALG1_LOE2_4	CCACTTGGCCAGCAGAACGA
ALG3_LOE3_1	GCACATGAAGAAAAAGACGA
ALG3_LOE3_2	CCGAGGCACTGACATCCGCA
ALG3_LOE3_3	CTATGACTATACCCAACTGC
ALG3_LOE3_4	ACTGAGGAAGAGCAGCACCA
ANKRD11_AJ_1	AGGCACTTACACTTCCAGCG
ANKRD11_AJ_2	GGTGCGGAGGTGAACACCAA
ANKRD11_AJ_3	GTGAACCTCCTGTTAGGCAA
ANKRD11_AJ_4	ACAGGAGGTTCACCATCGTG
ARFGEF2_LOE2_1	ACCTTCTAGGAATGTCCGCA
ARFGEF2_LOE2_2	AGTAAGGGTAGCCTTGGCAG
ARFGEF2_LOE2_3	AGATGTTCTTCCAACCTGAG
ARFGEF2_LOE2_4	AGGGCCCTCAGACTGATGAA
ARNT2_LOE3_1	GGATGTGCATGGGCTCGCGG
ARNT2_LOE3_2	GATTCCATCGGAGTTATGCC
ARNT2_LOE3_3	GATGGCTTTGATCCTCAGGG
ARNT2_LOE3_4	TCATCCATGAGGATGTGCAT
ASNS_LOE3_1	AGGATCAGATGAACTTACGC
ASNS_LOE3_2	ACAACAGTTCGTGCTTCAGT
ASNS_LOE3_3	TCCTTCTCCTGAAAAAGCCG
ASNS_LOE3_4	GGTCGCCAGAGAATCTCTTT
ASPM_LOE1_1	GCGTCAAATGGCACATAGCA
ASPM_LOE1_2	GCTCGGAAAAGAAAGAGCGA
ASPM_LOE1_3	TCTGCAGTTAGAGACCTTGG
ASPM_LOE1_4	GTATTTAATAGGCTATGTGG
ASXL3_LOE3_1	TCTTTGCATATGCAGCACAA
ASXL3_LOE3_2	ATTCAGACAGGCTAATGGAG
ASXL3_LOE3_3	TATCCGCAACTGCATTTCTG
ASXL3_LOE3_4	TCTTGTCAACACTAACTTGA
ATM_LOE1_1	TGATAGAGCTACAGAACGAA
ATM_LOE1_2	AAACAATTAAACATCTAGAT
ATM_LOE1_3	TCCAGAATTTTCAAGCCAGA
ATM_LOE1_4	GCCTGCAACCTGCTAAGTGT
ATP6V0C_dC_1	GCCTATGGCACAGCCAAGAG
ATP6V0C_dC_2	TTGCGGCCATGTCTGTCATG
ATR_LOE2_1	TGAGGCTTAGTAAAGCCCTC
ATR_LOE2_2	ACAATGCTCTCCTTAATGCA
ATR_LOE2_3	AGTCAATTAGATGAACACAT
ATR_LOE2_4	CTAGTAGCATAGCTCGACCA
ATRIP_AJ_1	TGAAGAACTGTACAAGGGGG
ATRIP_AJ_2	GCGAGACTCACTACATCAGA
ATRIP_AJ_3	GCTTGTATCCTGACTCCGTC
ATRIP_AJ_4	GCCAGCAGCACGCTAGGCAG
ATRX_AJ_1	TTACCTGTAAAGTCTTACCA
ATRX_AJ_2	TGAATTCTATACGATCAAGG
ATRX_AJ_3	GAGTTCAGTTGATCATCAAG
ATRX_AJ_4	CATCACTCTGACATCTACCA
BCOR_LOE2_1	GCTGCACCTTAGAGAACGAT
BCOR_LOE2_2	TGATCCCTCAGGAGTCCAGG
BCOR_LOE2_3	TCAGTGATGTTAGTCCCCTG
BCOR_LOE2_4	ATTGCAGCGAAATATGCGGG
BLM_AJ_1	TACGGCCACAGCTAATCCCA
BLM_AJ_2	GTAGTCAGTAAACAGTCCCC
BLM_AJ_3	TGATAATTGCTGTAAAACAA
BLM_AJ_4	TCACTTGATGGCCCTATGGA
BRF1_LOE2_1	CCTCTACCTGGTCTGCCGTA
BRF1_LOE2_2	AATTCAGGCCAGTACCGCCA
BRF1_LOE2_3	CTACCTGGTCTGCCGTACGG
BRF1_LOE2_4	GGATGTACTGGGCGAGCACA
CASC5_LOE3_1	TCCTGTACTGATAATTTGGA
CASC5_LOE3_2	GCAATAGCCAAAAAAGAACG
CASC5_LOE3_3	TTAGTGATACGACACAAGAT
CASC5_LOE3_4	CATAAAGTCTATTTGAGCAA
CASK_LOE1_1	CATCGTCGTACAACACTGAA
CASK_LOE1_2	GTAAGGATGATCATAATTGG
CASK_LOE1_3	GAAAGCCTGTAGACGTCTGG
CASK_LOE1_4	TTACCTTCTGTACTTAACCC
CCDC88A_LOE3_1	ATAGCACCCCAGAAATCCCT
CCDC88A_LOE3_2	GGTTGCCGCACATATTCAAG
CCDC88A_LOE3_3	AATGGCACAGAAACAAAGTA
CCDC88A_LOE3_4	GATTTTCTATATCTAAACCC
CDC16_dC_1	GACATTTCCCAGTCGCTGGA
CDC16_dC_2	GGTAAACAACTTGCATGGAA
CDC45_AJ_1	AGCCAGCTCAAACATCACCA
CDC45_AJ_2	GAACCAGCGTATATTGCACG
CDC45_AJ_3	TAGTGGAGCAAACCATGCGG
CDC45_AJ_4	CATACTGCTCGTAGTCAAAG
CDC6_LOE3_1	TCAGGAGATTTGTCAGGAAG
CDC6_LOE3_2	TCAACTGGACAGCAAAGGCC
CDC6_LOE3_3	TTTGCGACAGACTTTACTGT
CDC6_LOE3_4	TCACAGATAGAATTCTACCT
CDK5RAP2_LOE1_1	GCTCCGCATCTATTTCCTTG
CDK5RAP2_LOE1_2	CTGCTCCAGTTGCTTCCGCA
CDK5RAP2_LOE1_3	CCTTGAGGAAAGAATGCAAC
CDK5RAP2_LOE1_4	ATAATCAGAGAGCGCATGAG
CDK6_LOE2_1	GCCGCTCTCCACCATCCGCG
CDK6_LOE2_2	CCAGCAGTACGAATGCGTGG
CDK6_LOE2_3	CAACACTCCAGAGATCCACG
CDK6_LOE2_4	TTAGATCGCGATGCACTACT
CDKL5_AJ_1	ATGTCCACGGACTTTCCATA
CDKL5_AJ_2	CAGAGTACGTTGCCACCAGA
CDKL5_AJ_3	TTATTTTTCAGCGCTCCCTA
CDKL5_AJ_4	TACCTTCACCTACAACCCCA
CDT1_LOE3_1	GGCCTCCTACCGCTTCCGCC
CDT1_LOE3_2	GCATGTCAAGGAGCACCACA
CDT1_LOE3_3	ACCTTTGCCAAGGTCCAGCG
CDT1_LOE3_4	AGCGGCTGCAGCGCTTAGAA
CENPE_LOE2_1	TGTATGGCAGAATCGATGAT
CENPE_LOE2_2	GCAGGTACTATATTTGCCTA
CENPE_LOE2_3	ACATCTTCTCGAATAATTAA
CENPE_LOE2_4	TCAGGTGTGCGGCTCAAGGA
CENPF_LOE3_1	GGTTGAAAATGAAAAAACCG
CENPF_LOE3_2	CCTGATGCCGGGCAATGTCG
CENPF_LOE3_3	ACACCAAGTCAATATTATAG
CENPF_LOE3_4	TATGGGCTAGTCTTTCCTGT
CENPJ_LOE2_1	ACTCACACGACATACCCGGA
CENPJ_LOE2_2	CGTGTTATACTGTTTCCCAA
CENPJ_LOE2_3	GACTAACCAGAGCACTTCCG
CENPJ_LOE2_4	GCTGGGGTCATATTAAATCG
CEP135_LOE2_1	GAAGGAGGTCTGCAATGTAA
CEP135_LOE2_2	TGTTATCAGCCACTTGAAGG
CEP135_LOE2_3	GAGATGAGTCACCAGCCACA
CEP135_LOE2_4	TCTCAACTGAAAGATACGAA
CEP152_LOE2_1	GTATCTGAAAGATAACCGGA
CEP152_LOE2_2	GATTTCCAGTTGGATCCGCT
CEP152_LOE2_3	ACAGGAAGTGACACATTCGA
CEP152_LOE2_4	CCTACCGGACTGCAACAAGT
CEP63_LOE3_1	GTTGGATGTGACACATAAGG
CEP63_LOE3_2	GCAACAGGTATCTTCACTGG
CEP63_LOE3_3	CCAAAAATCACAGGGAAGAT
CEP63_LOE3_4	TTAAGTGTTGAATTTCTGAT
CIT_LOE3_1	TGAACCAGAGAAGAATTCGT
CIT_LOE3_2	GTTCACAGCGTTCATCTGAT
CIT_LOE3_3	TATCTCGAAGCACAAGCCCG
CIT_LOE3_4	ACTGGAGACAAGATTGCGGG
COG7_LOE3_1	CCAAGAGGTGAACCACGCCG
COG7_LOE3_2	TGTTGAAGCCCTAAAACAGG
COG7_LOE3_3	AGCCAGTCCACAGATTGTAG
COG7_LOE3_4	TCGAGCAGCAGCTAGCCAAC
CREBBP_LOE3_1	GTACGAGTCTGCCAACAGCA
CREBBP_LOE3_2	ATTGCCCCCCTCCAAACACG
CREBBP_LOE3_3	GCGTCCACAGCAATATCCAA
CREBBP_LOE3_4	GGTAGCCTATGCTAAGAAAG
CSNK2A1_LOE3_1	GATTGATCATGAGCACAGAA
CSNK2A1_LOE3_2	GTTTCAATGATATCTTGGGC
CSNK2A1_LOE3_3	TGGACATGACAATTATGATC
CSNK2A1_LOE3_4	CTAGTTGGTGAGGATAGCCA
CTCF_LOE3_1	GTTTGTGCAGTTATGCCAGC
CTCF_LOE3_2	GTGTGATTACGCTTGTAGAC
CTCF_LOE3_3	TACACTGGCGTAATCGCACA
CTCF_LOE3_4	CGATCCAAATTTGAACGCCG
CTNNB1_LOE3_1	TTACCCAAGCATTTTCACCA
CTNNB1_LOE3_2	TTAGTCCTCTGATAACAATT
CTNNB1_LOE3_3	TGTCTGCTCTAGTAATAAGC
CTNNB1_LOE3_4	ACATCCAAAGAGTAGCTGCA
DDB2_LOE1_1	GGGGCGTAATACAATCTCGG
DDB2_LOE1_2	ACAACAATGAGGTTGTAGCG
DDB2_LOE1_3	GGAGGGCTACATCCTTGGCG
DDB2_LOE1_4	TGACTCAGACTGCCTCTGGG
DHCR7_LOE3_1	TGTCCTTCGCAGCGAAGCAG
DHCR7_LOE3_2	TCTGCCATGACCACTTCGGG
DHCR7_LOE3_3	GGGCTACTACATCTTCCGGG
DHCR7_LOE3_4	GCGGCTTTCCTCGTTATAGG
DKC1_LOE3_1	GATTCGACGGATACTTCGGG
DKC1_LOE3_2	TAGGCCCTAGAAACTCTGAC
DKC1_LOE3_3	GTGGCTGTATGATAACCACA
DKC1_LOE3_4	TCAGATGCAGGAGCTTCGGA
DNA2_AJ_1	GCCACACGCCAAATGCTAAT
DNA2_AJ_2	GCAATAATACCAATATCAGA
DNA2_AJ_3	CATCTTGACTTAGATTTAAG
DNA2_AJ_4	TGCTTCACGGTTTAGCACCA
DNM1L_LOE3_1	ATAGTGTTAAGGCCACCAAG
DNM1L_LOE3_2	AAATAGCTACGGTGAACCCG
DNM1L_LOE3_3	GCTAATGAACAATAATATAG
DNM1L_LOE3_4	GGCACAAATAAAGCAGGACG
DONSON_AJ_1	GCTCCAATACGAGGGAACAG
DONSON_AJ_2	GGGCTTCCGAAAGCCGCCCG
DONSON_AJ_3	GGCTCTGCTGGAAGGTACAA
DONSON_AJ_4	GTTTTAATACTCCAGTCCAC
DPAGT1_LOE3_1	GATGCAGGCCAAGTATCGGG
DPAGT1_LOE3_2	AGCAGGAATTAACGGCCTAG
DPAGT1_LOE3_3	ACAGGCCTACTTACCAGTTG
DPAGT1_LOE3_4	GTGAAATAGACCATGAGGAG
DPM1_LOE3_1	ACCAGCAGCCACACGATGAG
DPM1_LOE3_2	ACTACGACTGACTTCCAAGG
DPM1_LOE3_3	ACTCGCTACAAAGGAAATGG
DPM1_LOE3_4	GTTGAATTATACTATTGGCG
DPP6_AJ_1	TGGTGTCAGAAGTATGACCG
DPP6_AJ_2	TGATGCCGCCTGATGATCCA
DPP6_AJ_3	GTACCTCTCGGAAAACGCAG
DPP6_AJ_4	ACTTTACCACCACCGCGCCG
DYM_LOE3_1	CTTACCTGCTAGTTGGTGCA
DYM_LOE3_2	TAGTGAGTTGAAACTCTTGG
DYM_LOE3_3	AGTAGAAGCTATATCAACAA
DYM_LOE3_4	GAATGGTTCTTATTACCACC
DYNC1H1_LOE3_1	GGTATTCGCGAATGGCCCCA
DYNC1H1_LOE3_2	GTGTCATTGTCAAATAGCAG
DYNC1H1_LOE3_3	CGAGGCTGTACTGAAACCCA
DYNC1H1_LOE3_4	ACCCACCTAGTCATTTCACG
DYRK1A_LOE1_1	GAGAAACACCAATTTCCGAG
DYRK1A_LOE1_2	GATGGTTTCGAAACATAAAG
DYRK1A_LOE1_3	TTACAGGAGTACAAACCACC
DYRK1A_LOE1_4	TTTCAACCAAAATACACCCG
EEF1A2_LOE3_1	GGTGGTCTCGAACTTCCAGA
EEF1A2_LOE3_2	TGGACTTTCCGGAGTCCACG
EEF1A2_LOE3_3	TCATCATCCTGAACCACCCG
EEF1A2_LOE3_4	TGTTCTTGATGAAGTCGCGG
EFTUD2_LOE3_1	GAAGTACTACCCAACAGCCG
EFTUD2_LOE3_2	AGAGCGGCTGATCAAGCATG
EFTUD2_LOE3_3	GCTGGCTTGCGCATCTCAGA
EFTUD2_LOE3_4	GCATCGTTCAAGGTTTCCAG
EIF2AK3_LOE3_1	ACTGCAATTATGCTATCAAG
EIF2AK3_LOE3_2	ACTCAGCTCTCCTAGCCCAA
EIF2AK3_LOE3_3	TGAAATATCTAACAATGCCC
EIF2AK3_LOE3_4	CATCTTCCTGCAGATCGCAG
EIF6_dC_1	GCTCTCCGATACCATCCCCG
EIF6_dC_2	GTGCACCACGGGGATGGTAT
EOMES_LOE3_1	AATCTCCTGTCTCATCCAGT
EOMES_LOE3_2	AGAACCGTGCCACAGACCAA
EOMES_LOE3_3	CTACAATGTGTTCGTAGAGG
EOMES_LOE3_4	CTCCTAATACTGGTTCCCAC
EP300_LOE3_1	TCATGCTTCTGACAAAACCG
EP300_LOE3_2	GTATTGTTAGTTACACAACA
EP300_LOE3_3	CATATGCTCGGAAAGTTGAA
EP300_LOE3_4	GTTCTGGTAACTGTAATAAG
ERCC1_LOE1_1	ATTACGTCGCCAAATTCCCA
ERCC1_LOE1_2	GGTACCGCCCAGCTTCCTCG
ERCC1_LOE1_3	GGTGCAGGTTGTGGTAGCGG
ERCC1_LOE1_4	ATTGCGCACGAACTTCAGTA
ERCC2_LOE1_1	ACGTCCTTCCCAAAGCGCAG
ERCC2_LOE1_2	GCACCGGTCAAGGGTCCGGC
ERCC2_LOE1_3	TTTGGCGTAGGTGCTGACAA
ERCC2_LOE1_4	GATCCTGCATGCCAATGTGG
ERCC3_LOE1_1	TAGTTTGTACTCATGCACAT
ERCC3_LOE1_2	CTCAGCCAAGATGTTCCGAA
ERCC3_LOE1_3	ATGGCAACGGAGCAGCCGAT
ERCC3_LOE1_4	AAATGGCTTACTCAACCAAG
ERCC4_LOE1_1	CGACCCCTATGCTCTGACAA
ERCC4_LOE1_2	TGAAGTGGAGCCAAGATACG
ERCC4_LOE1_3	TTTCATTTGTTACACGGCGA
ERCC4_LOE1_4	ATACTTCTCTGACTCGGGAA
ERCC5_LOE1_1	TTCGGGGCTGACCTGCCGCC
ERCC5_LOE1_2	TCAGCCCCGAAGCGCTGGAA
ERCC5_LOE1_3	AGCTGCTGGAGTGCTCCGGG
ERCC5_LOE1_4	GTCAGCCCCGAAGCGCTGGA
ERCC6_LOE2_1	AGAATTGCCACTCTGAACGG
ERCC6_LOE2_2	AGACAGAATGATCCGATGAG
ERCC6_LOE2_3	CCATAAATCCATACCTACTG
ERCC6_LOE2_4	GGTACCATCCATCTTGAGAT
ERCC8_LOE1_1	GCCAAGATATAGTCATAACG
ERCC8_LOE1_2	TGATGTTCACAGATACAGTG
ERCC8_LOE1_3	TCCTGTTCTCACATTCTACA
ERCC8_LOE1_4	GTTTTTGTCCGCACGCCAAA
ESCO2_AJ_1	TTTTGGGTGCGTCCCAGTGA
ESCO2_AJ_2	GTTGCAAATAACTTGCCATC
ESCO2_AJ_3	ATTTTTTGGCTCAGAACCCG
ESCO2_AJ_4	CGTGTCCTGTCTGAACCAAT
FGFR1_LOE2_1	GTTGCCCGCCAACAAAACAG
FGFR1_LOE2_2	GCATGGTTGACCGTTCTGGA
FGFR1_LOE2_3	TCTTACCCACGACATCCAGC
FGFR1_LOE2_4	GTTCCTCCACAGGCACACCG
FOXG1_LOE2_1	AGCGCGTTGTAGCTGAACGG
FOXG1_LOE2_2	TCGAGCGACGACGTGTTCAT
FOXG1_LOE2_3	CCCGGGTCGTCGTAGTGGCG
FOXG1_LOE2_4	AGCGACGACGTGTTCATCGG
GAPDH_dC_1	TGTAGTTGAGGTCAATGAAG
GAPDH_dC_2	TGCTGGCGCTGAGTACGTCG
GMNN_AJ_1	GTATATGGCAGAGCTAATAG
GMNN_AJ_2	TCTGCATCTGGATCTCTTGT
GMNN_AJ_3	AGTGATTCAAAATTATCCAG
GMNN_AJ_4	CTCTCAGTATTGGAAGGAAG
HCCS_LOE2_1	TCATGCATCGGGCATCCTGA
HCCS_LOE2_2	GGAACGCGCCTATGAGTACG
HCCS_LOE2_3	ACATACTTACGCAGCATGAA
HCCS_LOE2_4	TCACAATCAGAATAACGAGC
HDAC8_AJ_1	ATGCTGGACATACTTGACCG
HDAC8_AJ_2	ATGCACTGCATAAGCAGATG
HDAC8_AJ_3	CTATGCAGCAGCTATAGGAG
HDAC8_AJ_4	TCCTGGGGAGAATTTGTGCA
HMGA2_AJ_1	GCGCCTCAGAAGAGAGGACG
HMGA2_AJ_2	ACTGGAGAAAAACGGCCAAG
HMGA2_AJ_3	GCCCTCTCCTAAGAGACCCA
HMGA2_AJ_4	TTATTACTCACCCATTTCCT
h_non-targeting_nC_10	GGGCCCGCATAGGATATCGC
h_non-targeting_nC_1	CGTATTCGACTCTCAACGCG
h_non-targeting_nC_2	GATGGCGCTTCAGTCGTCGG
h_non-targeting_nC_3	ACGGAGGCTAAGCGTCGCAA
h_non-targeting_nC_4	CGCTTCCGCGGCCCGTTCAA
h_non-targeting_nC_5	ATCGTTTCCGCTTAACGGCG
h_non-targeting_nC_6	GTAGGCGCGCCGCTCTCTAC
h_non-targeting_nC_7	CCATATCGGGGCGAGACATG
h_non-targeting_nC_8	TACTAACGCCGCTCCTACAG
h_non-targeting_nC_9	TGAGGATCATGTCGAGCGCC
hrrm1_dC_1	TATTGAAAGATCTTACCGAG
hrrm1_dC_2	GCGATGCATGTGATCAAGCG
hrrm1_dC_3	CCTTGTACCCCAATTCCAAT
hrrm1_dC_4	TGAGAACCTTCCAGTCCCAT
IER3IP1_LOE3_1	TCTGTAAGAACCGTGATGAG
IER3IP1_LOE3_2	GGAACAGACCAGGGAATTGG
IER3IP1_LOE3_3	CGCCATCGCAGTGCTGCACG
IER3IP1_LOE3_4	GATGGCGTTGACGCAGAGCA
IGF1_AJ_1	CGCCAGGTAGAAGAGATGCG
IGF1_AJ_2	TCCGGAGCTGTGATCTAAGG
IGF1_AJ_3	TCAGTTCGTGTGTGGAGACA
IGF1_AJ_4	GTATGGCTCCAGCAGTCGGA
IGF1R_AJ_1	TCGGGCAAGGACCTTCACAA
IGF1R_AJ_2	TGTTTCCGAAATTTACCGCA
IGF1R_AJ_3	CCGAGAGCAGCGACTCCGAG
IGF1R_AJ_4	GATTTCACAGTCAAAATCGG
IGF2_AJ_1	ACTTACTGAAGTAGAAGCCG
IGF2_AJ_2	AGCACTCCTCAACGATGCCA
IGF2_AJ_3	TGGGTGGGTAGAGCAATCAG
IGF2_AJ_4	GTAAGCAGCAATGCAGCACG
IQSEC2_LOE3_1	TTCCTGTCAGACACACCGGT
IQSEC2_LOE3_2	GCATCTACCAGCGCATCCAG
IQSEC2_LOE3_3	GGAGGCTCAATGGTAAGCAG
IQSEC2_LOE3_4	GGAATTCCCCTATCATCTGC
KAT6A_LOE2_1	TTTGGCATACGGGTGAGTGG
KAT6A_LOE2_2	TCGAGATCAAGGCAAAAATG
KAT6A_LOE2_3	CCACTCACCCGTATGCCAAA
KAT6A_LOE2_4	TAGAAAAGTAGCCAACAAGG
KAT6B_AJ_1	AACGCGCTGTGAATAATGGG
KAT6B_AJ_2	CACATGATCGACAAGAGAGA
KAT6B_AJ_3	ATGCGGTCAGTACTTCCCAT
KAT6B_AJ_4	TTTGGCATTCTGGAAAGTGG
KATNB1_LOE2_1	GCTGAACCGGAGACACCGCA
KATNB1_LOE2_2	TGATGGACCACAGGTTGACG
KATNB1_LOE2_3	CTACAAACTCGCCGTACGGG
KATNB1_LOE2_4	GAGGCTACAAACTCGCCGTA
KCNJ6_LOE3_1	GTTCATCCCGTTGAACCAGA
KCNJ6_LOE3_2	ATAGAGACAGAAACCACCAT
KCNJ6_LOE3_3	TGGTTGATCGCATACATACG
KCNJ6_LOE3_4	TCAGGACGTCGAAAGCCCAG
KDM5C_LOE3_1	CCAAGGCTACAACTTTGCCG
KDM5C_LOE3_2	GGGCTACCCGAGCCCACCGA
KDM5C_LOE3_3	GGGCCTCACGAGCCTCAGCC
KDM5C_LOE3_4	TCACCAGTGGAGGTAGTTAA
KIF11_LOE2_1	GGATTGTTCATCAATTGGCG
KIF11_LOE2_2	GACTGAATTACCTTGTTACG
KIF11_LOE2_3	TCTTATCAACAGCTCCAGAA
KIF11_LOE2_4	ACAGGAAACTCTGAGTACAT
KIF14_LOE3_1	CTTCTTCTAGCTGAGAGTGA
KIF14_LOE3_2	TAGCAATGTTGACTATTAAA
KIF14_LOE3_3	TTAGAGTAATGGTAGTAACA
KIF14_LOE3_4	TCAACATATGGTCCATAAAC
KIF5C_LOE3_1	CGAAGCGGAGATCCTCCGCG
KIF5C_LOE3_2	AGGCAAAGCAAACCGACACG
KIF5C_LOE3_3	AGATATCATGGGCAATTCGT
KIF5C_LOE3_4	CCTCTGTCCGAACATCAGTG
KMT2A_AJ_1	CTTGTGTGAACTGTACTGAG
KMT2A_AJ_2	ATATTGCGACCACCAAACTT
KMT2A_AJ_3	GGATGTGAGACAGCAACCCA
KMT2A_AJ_4	GGATCATTAAGACCCCTCGG
LARP7_AJ_1	TGAAAAAATTGACTACTGAT
LARP7_AJ_2	GTGATCACGAACAAAGGTAT
LARP7_AJ_3	AAGTCTACTGGAGATCCAAA
LARP7_AJ_4	TCTCCAGTAGACTTATAATG
LIG4_LOE1_1	GCATCTCCATGAGTTCCAGT
LIG4_LOE1_2	GACGACCTAGAACAACTTAG
LIG4_LOE1_3	TCTGCAATAGCAGCTAGCAT
LIG4_LOE1_4	TATTGGGGTAAAGGATCACG
MCM4_AJ_1	GATGTTCTAAGATTGAGTCA
MCM4_AJ_2	CTGGCTTCCTCACTTAGCCG
MCM4_AJ_3	GCTCTTGTCCTGAGTGACAA
MCM4_AJ_4	CCAAGTCGTTGCATGTATAG
MCPH1_LOE1_1	CCAGGAGATCTATCATGCCG
MCPH1_LOE1_2	GGATGACCACACTTCAACAT
MCPH1_LOE1_3	GCTCGTTTCGGTGCTCTGGG
MCPH1_LOE1_4	GAGAGGCGTAAAGCTCGTTT
MECP2_LOE1_1	GGACACGGAAGCTTAAGCAA
MECP2_LOE1_2	AGAAAAGTACAAACACCGAG
MECP2_LOE1_3	GCCAGCTAAGACTCAGCCCG
MECP2_LOE1_4	GTTGATTGCGTACTTCGAAA
MED12_AJ_1	TCCAACCTGCCCATGCCAGA
MED12_AJ_2	CTGGTCGTAAAGAACCCCAA
MED12_AJ_3	GCATTGGTAAGCGCACAGGA
MED12_AJ_4	TCAGCTGCAAGCGTTCTGGT
MED17_LOE3_1	TGAAGAGGTTAACTGTGCCG
MED17_LOE3_2	GGTTTTTCACCACAATGTGA
MED17_LOE3_3	CTAGGAATTTGGACATCAAG
MED17_LOE3_4	GCAGCACAAATCTTATTGAA
MED25_LOE3_1	AGCCTCCACTGCATTCTGCG
MED25_LOE3_2	TATCCATACCTGGTACTGCG
MED25_LOE3_3	TGATTCACGTAGACCTGGCA
MED25_LOE3_4	GGATTCAGATGAGATCCATG
MRE11A_AJ_1	GCAATCATGACGATCCCACA
MRE11A_AJ_2	TTTCAGGTGGACTATAGTGG
MRE11A_AJ_3	CAATCATGACGATCCCACAG
MRE11A_AJ_4	GTACTTTCAAACCGCAGAGA
MYCN_LOE2_1	GCACAGACTGTAGCCATCCG
MYCN_LOE2_2	GATCCGGAGGCGATTCTGCG
MYCN_LOE2_3	GGACTGGGCGGTGGAACCGG
MYCN_LOE2_4	CCCCGTTCGTTTTAATACCG
MYO18B_LOE3_1	GCTTTTGGAGAAGAGCCGCG
MYO18B_LOE3_2	TGAGATGCCGAGCATCTCCA
MYO18B_LOE3_3	GCACCAGGGCAAGGACCGGG
MYO18B_LOE3_4	CTTCCGGCTGCCGTGCCACG
NAA10_LOE3_1	AGGCTGTGAAGCGTTCCCAC
NAA10_LOE3_2	AGCCAGACCGAGGCGCCGGT
NAA10_LOE3_3	ACGAGAATGGGAAGATTGTG
NAA10_LOE3_4	CCATGGACATATCACCTCAT
NBN_LOE1_1	ATATACCTTTCAATTTGTGG
NBN_LOE1_2	GCTAAGATCAAATAAAAAAA
NBN_LOE1_3	TGTGAAAGTTCAAAAACAGG
NBN_LOE1_4	CAAGCTATATTGCAACTTGG
NDE1_LOE2_1	AGGCCTGGACGACTCCACCG
NDE1_LOE2_2	ACTGACCCCTTATCACCCAA
NDE1_LOE2_3	ATCGTACACGAGGTTCCGGC
NDE1_LOE2_4	GTAGGTCTCCCACAATGTTG
NGLY1_LOE3_1	ACTAGACTCTTGCCTGTCAG
NGLY1_LOE3_2	GAGCTTCAAACCCTACAGCT
NGLY1_LOE3_3	GGCTGTGTTTCCAATCCGGA
NGLY1_LOE3_4	CCTGTAGACCCATTTCACCT
NHEJ1_LOE3_1	GGCATTACAGTGCCAAGTGA
NHEJ1_LOE3_2	GTTGGTTTCAGATCTTCAAC
NHEJ1_LOE3_3	GGAGATTATCCAAATGACAG
NHEJ1_LOE3_4	GCAAGCTGTAGCCACGCCCA
NIPBL_LOE2_1	TTACCCTTGCTAGAATACTG
NIPBL_LOE2_2	CTTTAGCTTGATATGCAACG
NIPBL_LOE2_3	GCTCGTTCTGATTTTAACCG
NIPBL_LOE2_4	TTACTGCTGTGTCATCAAAG
NSDHL_LOE3_1	GCAAGGGTTTGATAATCCCC
NSDHL_LOE3_2	TCATGGCATTTTCGGCCCAA
NSDHL_LOE3_3	GAGGGCGTCGATATCAAGAA
NSDHL_LOE3_4	CCCACCATCCAGTAACAACA
NSUN2_AJ_1	GGATGCCTGGAATCACACAG
NSUN2_AJ_2	GCTGCGGATTGCAACACGCG
NSUN2_AJ_3	TGCTCGTCCATCAAGCCAAG
NSUN2_AJ_4	ATGTGTCTAATGAACTGCCA
OCLN_LOE3_1	GGTAACAAAGATCACCAACG
OCLN_LOE3_2	TGTTGATCTGAAGTGATAGG
OCLN_LOE3_3	ACCCACTCCTCGACATTGGG
OCLN_LOE3_4	GTAAGCTCTTGTATTCCTGT
ORC1_LOE3_1	ACAAATGCCGTTTACAGGCA
ORC1_LOE3_2	GCGTATCACTTCATGAACAG
ORC1_LOE3_3	GGAGATCACCTGTTGAAACG
ORC1_LOE3_4	ACAGTTTGTGTTGATTGAAG
ORC4_LOE3_1	GCTTGCTTCCATTAGATCTA
ORC4_LOE3_2	ATAGCATGTGTAATGACCGC
ORC4_LOE3_3	AATTCATTAAGTGTATCTGC
ORC4_LOE3_4	TCGCCCTAAAGGAAATCACA
ORC6_LOE3_1	CGGCCTCTCCGCACGCACCA
ORC6_LOE3_2	ACCTACTTACCCTGTCCAAG
ORC6_LOE3_3	GCTGGTCTCCGTGGTGCGTG
ORC6_LOE3_4	TAAACTTTCTGGTTTGAACA
PAFAH1B1_LOE1_1	CTAAAAAAAGTGGTAAACCT
PAFAH1B1_LOE1_2	ACACAGAGAATGGGTACGTA
PAFAH1B1_LOE1_3	TTTCTGTAGACTAAAAAAAG
PAFAH1B1_LOE1_4	GAATGCATCAGAACCATGCA
PCNT_LOE1_1	GATCACCGAGAAATTCAGTG
PCNT_LOE1_2	GTGATGCCAATACAACCCCA
PCNT_LOE1_3	GGATGGTCTGAAATTCAGAG
PCNT_LOE1_4	GCGCAATGTAGCCCTCAGGG
PGAP3_LOE3_1	ATACTTACAGTCGTCCCGAC
PGAP3_LOE3_2	AGTGTATGTGGGTCACCGTT
PGAP3_LOE3_3	GGTTGTAGCCATAGTCGAAG
PGAP3_LOE3_4	GTCCACAGTTTTCCACACCA
PHC1_AJ_1	ATAGGGTAAACCCCAGACAT
PHC1_AJ_2	ACACCTCCTCTACACTCCAA
PHC1_AJ_3	CATCCTGATGCCTAATGGGG
PHC1_AJ_4	TATCTTTCAGGTTCAGAACT
PIGA_LOE3_1	TCAGGGATCGATTTGCTTAG
PIGA_LOE3_2	GGATATTTCTGACAGAGTTC
PIGA_LOE3_3	GGTATATGACCGGGTATCAG
PIGA_LOE3_4	GTTTGTATCACAAAGAGACA
PLK4_LOE2_1	TCATGCACCAGATCATCACA
PLK4_LOE2_2	TTGACTGTGTCAGTGTCGAA
PLK4_LOE2_3	GTTTAATGATGGGTCCCAGT
PLK4_LOE2_4	TCATATTACGAGTCAGTAGG
PNKP_LOE2_1	GGACATATCCGGCCGACACG
PNKP_LOE2_2	ACATGACCATGTCTGACACG
PNKP_LOE2_3	CTTCAGGGCTGTCTCACACG
PNKP_LOE2_4	GCAGGCGCGCCACAACAACC
POC1A_AJ_1	CGAAGGAGAGAAGTTCACAC
POC1A_AJ_2	TCAGGTTCTCCCCCGACGGG
POC1A_AJ_3	ACTGGGGTGGAAGTCCACAT
POC1A_AJ_4	CTGTTGCCTTTTCAAGAACG
POLH_LOE1_1	GGTGAGGTTAGCTTTCCCAC
POLH_LOE1_2	AATGAGAGCAGCCATAGAGA
POLH_LOE1_3	AGGCAACTACCCAAAACCAT
POLH_LOE1_4	GTGAGCATCATAGCGGGTAA
POLR2D_dC_1	GCAGCAGAATGAGAGTGCAG
POLR2D_dC_2	GCGGGCTGGCGACGTAGAGG
PQBP1_LOE2_1	AACACCTTGTACCAGCTTGG
PQBP1_LOE2_2	ACAGCGGGCTCCCTTACTAC
PQBP1_LOE2_3	GTGTCTGCATTCCAGTAGTA
PQBP1_LOE2_4	GTCTGCAGCGCAACGGGCAG
PRKDC_AJ_1	GTATGAATAGCTCTGAACTC
PRKDC_AJ_2	GTTGAAGTCCAGGTTTACAA
PRKDC_AJ_3	TATTGACAAAGCTATGCACG
PRKDC_AJ_4	GCTTGTTATTAATACAGAAG
PRPF19_dC_1	GCGCTCCGTGGTTAGCACAG
PRPF19_dC_2	GCTCATCGAGAAGTACATTG
PTEN_pC_2	ATATCACCACACACAGGTAA
PTEN_pC_3	ACTTTGATATCACCACACAC
PTEN_pC_4	CTCCCGCTCCTGGAGCGGGG
PYCR2_LOE3_1	CTATTATCTTGTGAGCCGAC
PYCR2_LOE3_2	GACACGCAGCAACAAGGAGA
PYCR2_LOE3_3	TCACCGTGTTCGGATACAGG
PYCR2_LOE3_4	GCGCCTGGCAATCCAACTCG
QARS_LOE3_1	GTGACTCCGCACATACTCAA
QARS_LOE3_2	GAGGCTAGTGAAGAGCGACA
QARS_LOE3_3	ACTTGACTCGATAGGCTACA
QARS_LOE3_4	AGGCTGCTATTAACAGGCAC
RAB18_LOE3_1	AGGAGTCAAACTGTCACACA
RAB18_LOE3_2	TTAACTCCCAGCTATTATAG
RAB18_LOE3_3	TTTGCTTACGTATTGCCACA
RAB18_LOE3_4	GGCACATTGTACACCATCAC
RAB3GAP1_LOE3_1	GGAGATGGCAAATTTAAGGC
RAB3GAP1_LOE3_2	AATTAGGTACATCGGCAGAG
RAB3GAP1_LOE3_3	TTTAGCTAAATTAGGTACAT
RAB3GAP1_LOE3_4	TCTTGAAATGCGTTTCCGAT
RAB3GAP2_LOE3_1	TTATGTACGCTTCTACACTG
RAB3GAP2_LOE3_2	AACATGCGCAGACAACAGAG
RAB3GAP2_LOE3_3	TGTCTCAGTATATCACTGTA
RAB3GAP2_LOE3_4	GCTGGGAGTACGTTGTTCAG
RAD21_AJ_1	TAAATTACACTCGAACACAT
RAD21_AJ_2	TTGACCTGCCTGAGGAAAAT
RAD21_AJ_3	CTTCATTACAGTCTGCAAGA
RAD21_AJ_4	TCTGTTCAGACTCTAATAGG
RAD50_LOE3_1	GTCCAACCAAAATTGTAAGG
RAD50_LOE3_2	GCGGAATTATAACTACCGAG
RAD50_LOE3_3	TATTTCCTTTGGTTCCAGGA
RAD50_LOE3_4	CGGCAGGTACGTCAGACACA
RAN_dC_1	AGTCAAATGACGTTTCACGA
RAN_dC_2	ATGAACCTCAACACCCAAGG
RARS2_LOE3_1	TCGAATTTCTAGAAAAGGAA
RARS2_LOE3_2	ATTCCCTGTACTACTCCAAA
RARS2_LOE3_3	GTCATCTTTTAGGTTCGACG
RARS2_LOE3_4	GTGCCCTTTGGAGTAGTACA
RBBP8_LOE3_1	GATTCGTTCCTGTTTTAGCT
RBBP8_LOE3_2	CGTTGCCCAAAGATTCCCCA
RBBP8_LOE3_3	TCAGTCTCCAAACTTTCACG
RBBP8_LOE3_4	TCACCAAAAATCAACAGCTG
RBM10_AJ_1	AAGCAGACCCAACTGAACCG
RBM10_AJ_2	GCAGGACGCTACACGATGGA
RBM10_AJ_3	GTCTTGCCGTCGATAGTGAG
RBM10_AJ_4	CACCATGGATTCCATCCTGG
RGS6_LOE3_1	CAGGTACTGACATTGTGCAG
RGS6_LOE3_2	AAGCTCTTGACTGTTCTGAT
RGS6_LOE3_3	GGGAGCCTTATCGCTGCCCA
RGS6_LOE3_4	ACAGTTGAAGCAATACACTT
RIMS3_LOE3_1	CGAGCTGGACCTCAGCGCCG
RIMS3_LOE3_2	CAGGCTCTGCTCTTTGACGA
RIMS3_LOE3_3	TGAAGCACTTGTGGTCCATG
RIMS3_LOE3_4	GGTTCATCTTCCCCACTACC
RPL32_dC_1	CGTAACTGGCGGAAACCCAG
RPL32_dC_2	AGGGTTCGTAGAAGATTCAA
RPL3_dC_1	ATGGAGATCCAGGTGAACGG
RPL3_dC_2	GAAGGTGAAGAGCTTCCCTA
RPS19_dC_1	GGAAGCTGAAAGTCCCCGAA
RPS19_dC_2	GCTGAAAGTCCCCGAATGGG
RPS6KA3_LOE2_1	CCAGAAGTAGTTAATCGTCG
RPS6KA3_LOE2_2	ATTCAGCTTTTCAAACTGAA
RPS6KA3_LOE2_3	ATGCACATCAGCTTTTTCGG
RPS6KA3_LOE2_4	TTGTGCGTGAAGATATTCAA
RTTN_LOE3_1	ACGCTGATCTCATTCAGGAG
RTTN_LOE3_2	GCATCGCCTGGCATTACAGT
RTTN_LOE3_3	GCTCCAGTGTCTGAAAGTGA
RTTN_LOE3_4	GTGACTCTGTCAGAACCAAG
SHH_LOE1_1	GAGGAGTCTCTGCACTACGA
SHH_LOE1_2	AGAGGAGTCTCTGCACTACG
SHH_LOE1_3	GCGCCAGCGGAAGGTATGAA
SHH_LOE1_4	GCTGCGGTCGCGGTCAGACG
SIN3A_LOE3_1	TCTGCAATAGGAATCTACGG
SIN3A_LOE3_2	ATCCCATTATCAGATCGGGG
SIN3A_LOE3_3	GGAGTCCGTCCTGTACACTT
SIN3A_LOE3_4	TTAGCGCTGGTCAGACTACG
SLC1A4_LOE3_1	GAGTTGCGCAATGAACACCG
SLC1A4_LOE3_2	CACCATTGCCATTATCCTGG
SLC1A4_LOE3_3	AACAACGTAGAGCTCAACGC
SLC1A4_LOE3_4	GTCATGAGTAGGCAGCCCAA
SLC25A19_LOE2_1	GGGTGCGCAGAACATCCACG
SLC25A19_LOE2_2	GCTCACCCTGAGCTGCAAAG
SLC25A19_LOE2_3	CCCCAGGTACGGAGATACAA
SLC25A19_LOE2_4	GACCCCAGCGCAAAGTACCA
SLC25A22_LOE3_1	GTACATGCCGAAGTAGCCCT
SLC25A22_LOE3_2	ACTTCTCCTCGGACGCCGGG
SLC25A22_LOE3_3	CGGCTCCAGTCACTTCAGCG
SLC25A22_LOE3_4	GCAGGTGACACCGATCAGCC
SLC2A1_AJ_1	GGAGTTCTACAACCAGACAT
SLC2A1_AJ_2	GGATGCTCTCCCCATAGCGG
SLC2A1_AJ_3	GATGTCCTATCTGAGCATCG
SLC2A1_AJ_4	TGGCCCGGTTCTCCTCGTTG
SLC9A6_AJ_1	AGTTCTATGCTGAGACTCCG
SLC9A6_AJ_2	GAGATACTGCCACTTATGCA
SLC9A6_AJ_3	AATTAAGTAAGAGGGACAAG
SLC9A6_AJ_4	GTCACATTATTTACATCACT
SMARCAL1_AJ_1	AGTATTTGCACACCATAAGG
SMARCAL1_AJ_2	TTGTCTTTTAGGTTGCCAAG
SMARCAL1_AJ_3	GAAGTACTCACCTTTAGGAC
SMARCAL1_AJ_4	CATCGCAGCCTTTTACCGGA
SMC1A_LOE2_1	GCAGGAGTTTGAAGAACGGA
SMC1A_LOE2_2	GCTGGCATAGGTCAATGAGG
SMC1A_LOE2_3	CCATCGCAAGAAAAATATTG
SMC1A_LOE2_4	GTACAAGATCAACAACAAAG
SMC3_LOE2_1	GTCATCTTCGTCCAGAACAG
SMC3_LOE2_2	GCTGGTTTATTCCTTTTCGA
SMC3_LOE2_3	AAAATTTTAGATCAACCAGA
SMC3_LOE2_4	TTTTTCTTAGATCGATAAAG
SMU1_dC_1	AGTAGTCTCCTCCTGCAAGG
SMU1_dC_2	TGTGCTCGATTTTCTCCAGA
SNAP29_LOE2_1	GTACGAGTCCGAGAAGGTTG
SNAP29_LOE2_2	CCTCATGTACGAGTCCGAGA
SNAP29_LOE2_3	TGACCGGCTGACAACCAAAG
SNAP29_LOE2_4	GCATTAAGAGCGTGTTTGGG
SNRPF_dC_1	GCTTTCCTGTTAGTCCATTG
SNRPF_dC_2	GGCTATCTGGTATCTGTAGA
SPOCK1_LOE3_1	TAGCCCTCCTCATTACACCG
SPOCK1_LOE3_2	TGACTCTGTTCGCATCCTCG
SPOCK1_LOE3_3	GAGTCACACGAGTTGAAAAG
SPOCK1_LOE3_4	TGGCCACTCCTACACATCCA
SRCAP_AJ_1	GATAGCGCCAGTTCTTGCGA
SRCAP_AJ_2	CATGGTGATCCGGCACCACG
SRCAP_AJ_3	TTTACTGCGCCGAGTTAAGG
SRCAP_AJ_4	TCTGGGCTCCATAGTAAGTG
SSR4_LOE3_1	GGAGATCTCCACAATGAAGA
SSR4_LOE3_2	GCCTCTGTTTACAGTCAGCG
SSR4_LOE3_3	ATGGCTCTCTATGCTGACGT
SSR4_LOE3_4	AGCACCTCAGTGGACACCCA
STAMBP_LOE2_1	TGGAGTTGAGATTATCCGAA
STAMBP_LOE2_2	GTCGACACTGGAGAGAAACG
STAMBP_LOE2_3	AATGGCATCCATTTACTCTG
STAMBP_LOE2_4	CTAGCGGGCCACCTAGGCCA
STIL_LOE1_1	ATGACGATAAGCAAGTCGGA
STIL_LOE1_2	CATGGAATCAAAAAGTCCCC
STIL_LOE1_3	TCTGTTTAGATATCTTACCA
STIL_LOE1_4	GCGCTCAGTTCACAACGGAT
TAF2_LOE3_1	GTACCGCCATTGGATTAAAG
TAF2_LOE3_2	TTTGTGGTACCCAGTGTAGA
TAF2_LOE3_3	GATGTGGTATGTTTCAGCAA
TAF2_LOE3_4	GTTGATGCAGACGACCTGAT
TagBFP_nC_1	AGTCACCACATACGAAGACG
TagBFP_nC_2	AAGTGCACATCCGAGGGCGA
TagBFP_nC_3	CACGCCCCCGTCTTCGTATG
TagBFP_nC_4	GAGACGCTGTACCCCGCTGA
TBC1D20_AJ_1	ACGTCCGGCGGTCATTGCGG
TBC1D20_AJ_2	GGATTCACCTGGTCAATGAT
TBC1D20_AJ_3	CTACTGGTACCTGAGGTGGT
TBC1D20_AJ_4	AAGTGTTGCTGGACGTCCGG
TBCD_LOE3_1	ACAGTTGGATCTGTTACCGA
TBCD_LOE3_2	CCTTTGTGACTGCAATCTCG
TBCD_LOE3_3	TGATCTTCATGGTAACCAGG
TBCD_LOE3_4	ACAGCTGACTATTTTGCCGT
TCF4_AJ_1	GCAAGACACGAAATCTTCGG
TCF4_AJ_2	TGGCGAGTCCCTATTGTAGT
TCF4_AJ_3	ATGTCCACTTTCCATCGTAG
TCF4_AJ_4	TGCGTGTCTGAAAAGAAGGG
tomato_nC_1	GCGGCCCCCTGCCCTTCGCC
tomato_nC_2	TCACCTTCAGCTTGGCGGTC
tomato_nC_3	GAAGAAGACCATGGGCTGGG
tomato_nC_4	GCATTACGGGGCCGTCGGGG
TRAIP_AJ_1	GGTTTGAGACAGCACCAAGT
TRAIP_AJ_2	CTTCGATCACTCCCGCGACG
TRAIP_AJ_3	GCTCCAAACTCACTGCTCCA
TRAIP_AJ_4	TCGACCGCCTGGTTTTAGAG
TRAPPC9_LOE2_1	CGACATGTGGTAATGCACCA
TRAPPC9_LOE2_2	CCAATTATCGCACACAACCG
TRAPPC9_LOE2_3	GTACCCAGTGACGCTCGTCG
TRAPPC9_LOE2_4	AATTGGTCTACGCCTCCCGA
TRIO_LOE3_1	TTACATCCATACATTCCCGG
TRIO_LOE3_2	TCTCGGCTAGACGTACCTGT
TRIO_LOE3_3	GATCTGCCTGATCTGCTGCG
TRIO_LOE3_4	CCCACAGTGCAAATTTGCAA
TRMT10A_LOE3_1	ATGTTACGCAGAAAACCGAC
TRMT10A_LOE3_2	TTTTACTTGACAAGCCACGG
TRMT10A_LOE3_3	TACTAATCCTCCAATCACAT
TRMT10A_LOE3_4	ACTATCTTGCCCCAACGGAA
TUBB2B_LOE3_1	TGCCTGGCTCCAGATCCACG
TUBB2B_LOE3_2	CTTTCTGTCTCAGTTTTGGG
TUBB2B_LOE3_3	CTAACCGAATCCATCGTGCC
TUBB2B_LOE3_4	GGATCTGGAGCCAGGCACGA
TUBB_LOE3_1	ACTGCCATCTTGAGGCCACG
TUBB_LOE3_2	ACTGCATTCCAGGTCAGTCT
TUBB_LOE3_3	CCACGGCCGATACCTCACCG
TUBB_LOE3_4	GTCTGGGGCAGGTAACAACT
TUBG1_LOE3_1	AGATGGTAGTGACAGTCTAG
TUBG1_LOE3_2	CGGCTACACCCCTCTCACTA
TUBG1_LOE3_3	TGTCATTCAGCCGTTCTAAG
TUBG1_LOE3_4	GTGCCTGAGGAGCGATGCCG
TUBGCP6_LOE2_1	TGTTTTATAGCGACGCCAGG
TUBGCP6_LOE2_2	CACCTGGCAGAGCGAAAACG
TUBGCP6_LOE2_3	ACAACTCACGGTGGGAAACG
TUBGCP6_LOE2_4	TGTCGGATGTGGTTTCCACC
UBE3A_LOE1_1	CTAGCTAGAGATGATCGCTA
UBE3A_LOE1_2	CTACTACCACCAGTTAACTG
UBE3A_LOE1_3	GCTTACCTTGAGAACTCGAA
UBE3A_LOE1_4	CTATGCAAATGTAGTGGGAG
VPS13B_LOE2_1	AGTGAAAGCTGTAGATCCGA
VPS13B_LOE2_2	GTATTGATGGTAATTACCAC
VPS13B_LOE2_3	CAAAATCATCAATCAAACCG
VPS13B_LOE2_4	GCTCTCAGCAGTACTACGGT
VRK1_LOE3_1	GTAGGATTACCCATTGGCCA
VRK1_LOE3_2	ACGAGCATCGATGCACACAA
VRK1_LOE3_3	TTTGTAGCCCCATCAAGACG
VRK1_LOE3_4	GCTCTGGTTTTGCAGCTCGT
WDR4_LOE3_1	TGTGGTCTACATCTTCCAGC
WDR4_LOE3_2	GCTGAAGCTGGCGTCTGCGC
WDR4_LOE3_3	ACAGGCCTGTGGGCGACCAG
WDR4_LOE3_4	GAGTTGGCACCACGGAGATA
WDR62_LOE1_1	GTCCAAGTTCCAGAAGCGAA
WDR62_LOE1_2	GCCCAAGCCACACTACCTTG
WDR62_LOE1_3	CTATGCGCGGAACGATGCAG
WDR62_LOE1_4	GCTGTGGTCCTTATACACGC
WDR73_LOE2_1	CCAGGGCTGCGATTCTCCAA
WDR73_LOE2_2	AGAGATTTCAAAGTGCGCCA
WDR73_LOE2_3	GAGTGAAGAGAGGTTCTACT
WDR73_LOE2_4	CACACCTGCAGATAACAACC
WWOX_LOE3_1	CCAAGATCACATGTGCACCA
WWOX_LOE3_2	GTTTTTTAACAGTCACACCG
WWOX_LOE3_3	GTTTCTTGTTCCCATCCGTA
WWOX_LOE3_4	CCATCTTTGGTGAGACTCCA
XPA_LOE1_1	AATCCACATCATTCACAATG
XPA_LOE1_2	AGAAGCAAAGGAAGTCCGAC
XPA_LOE1_3	TACCTGCAGTTATCACAAGT
XPA_LOE1_4	CTCGATACTCGCCCGCACCG
XPC_LOE1_1	CTTCTCCACGACAATACCCA
XPC_LOE1_2	TGATGGATACATCGTCTGCG
XPC_LOE1_3	GTTTGGCTACTGGCAGACAG
XPC_LOE1_4	ACATTCCCAAACTCGTTCCG
XRCC4_AJ_1	AGAAGCTGATGACATGGCAA
XRCC4_AJ_2	TCAGTACTCTCATCATAGAC
XRCC4_AJ_3	TGTAGAAAATGAGTTATACT
XRCC4_AJ_4	TGTGAGTGCTAAGGAAGCTT
ZBTB18_LOE3_1	GCTGGTCCTTGTAAAAGAGG
ZBTB18_LOE3_2	TGTGACTGCACTGTTCTGGT
ZBTB18_LOE3_3	CAAGCAGGAGAGCGAAAGCG
ZBTB18_LOE3_4	GCACATGAAGATCTGGCCCG
ZEB2_LOE3_1	GCAAACGTGTAGCTACAGAG
ZEB2_LOE3_2	GTCTGTTCAGACATACTAAG
ZEB2_LOE3_3	GTATTCCTACTGCAAGCGGG
ZEB2_LOE3_4	TGTTTGCGCCTCTTGCACCG

convert the gRNAs to FASTA (one single concatenated sequence)

cut -f 2 gRNAs.txt |perl -pe "s/\n//g" |fold -w 60|sed '1i >gRNAs' > gRNAs.fasta

make a annotation file corresponding to the above FASTA

echo "##description: gRNAs from Simon Haendeler" > gRNAs.gtf
echo "##provider: Jean Elbers" >> gRNAs.gtf
echo "##format: gtf" >> gRNAs.gtf
echo "##date: 2021-04-01" >> gRNAs.gtf

i=1
while read a b
do
let k=`echo $b|wc -L`
let j=$i+$k-1
  echo -e "gRNAs\tMAKER\tgene\t${i}\t${j}\t.\t+\t.\tgene_id \"${a}\"; gene_name \"${a}\";" >> gRNAs.gtf
  echo -e "gRNAs\tMAKER\ttranscript\t${i}\t${j}\t.\t+\t.\tgene_id \"${a}\"; transcript_id \"${a}.T\"; gene_name \"${a}\";" >> gRNAs.gtf
  echo -e "gRNAs\tMAKER\texon\t${i}\t${j}\t.\t+\t.\tgene_id \"${a}\"; transcript_id \"${a}.T\"; gene_name \"${a}\";" >> gRNAs.gtf
let i=$i+$k
done < gRNAs.txt

Make STAR index for the gRNA FASTA and annotations

version

/nfs/scistore16/itgrp/jelbers/git/STAR/source/STAR

Usage: STAR  [options]... --genomeDir /path/to/genome/index/   --readFilesIn R1.fq R2.fq
Spliced Transcripts Alignment to a Reference (c) Alexander Dobin, 2009-2022

STAR version=2.7.10a_alpha_220601
STAR compilation time,server,dir=2022-06-23T13:48:46+02:00 :/nfs/scistore16/itgrp/jelbers/git/STAR/source
For more details see:
<https://github.com/alexdobin/STAR>
<https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf>

To list all parameters, run STAR --help

make index

/nfs/scistore16/itgrp/jelbers/git/STAR/source/STAR \
--genomeChrBinNbits 9 --genomeSAindexNbases 7 \
--runMode genomeGenerate \
--genomeDir snakemake2/genome-star-idx \
--genomeFastaFiles gRNAs.fasta \
--sjdbGTFfile gRNAs.gtf \
--sjdbOverhang 19

Convert the FASTQ files into a format for umi-tools 1.1.2

cd snakemake2

bgzip -@24 SRR11868220-orig-I1.fastq && \
bgzip -@24 SRR11868220-orig-I2.fastq && \
bgzip -@24 SRR11868220-orig-R1.fastq &

paste -d "" \
<(bgzip -cd -@4 SRR11868220-orig-I1.fastq.gz|paste - - - - |cut -f 1,2|awk '{print $1,"\n",$2,"\n+","\n","???????????";}' OFS='') \
<(bgzip -cd -@4 SRR11868220-orig-I2.fastq.gz|paste - - - - |cut -f 2|awk '{print "\n",$1,"\n","\n","?????????";}' OFS='') \
<(bgzip -cd -@4 SRR11868220-orig-R1.fastq.gz|paste - - - - |cut -f 2,4|awk '{print "\n",$1,"\n\n",$2;}' OFS='') | \
bgzip -c -@ 24 > SRR11868220-orig-I1-I2-R1.fastq.gz &

negative binomial code

see https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file so you can recreate the conda environment, using crispr-licht-04May2022.yml below

mamba create -n crispr-licht-04May2022 -c bioconda -c conda-forge bioconductor-deseq2=1.32.0 \
r-data.table r-dplyr r-optparse r-reshape2 r-MASS r-doparallel r-foreach

conda activate crispr-licht-04May2022
conda env export |sed '$ d' > crispr-licht-04May2022.yml
crispr-licht-04May2022.yml Click to expand!
name: crispr-licht-04May2022
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - _r-mutex=1.0.1=anacondar_1
  - binutils_impl_linux-64=2.36.1=h193b22a_2
  - binutils_linux-64=2.36=hf3e587d_9
  - bioconductor-annotate=1.70.0=r41hdfd78af_0
  - bioconductor-annotationdbi=1.54.0=r41hdfd78af_0
  - bioconductor-biobase=2.52.0=r41hd029910_0
  - bioconductor-biocgenerics=0.38.0=r41hdfd78af_0
  - bioconductor-biocparallel=1.26.0=r41h399db7b_0
  - bioconductor-biostrings=2.60.0=r41hd029910_0
  - bioconductor-delayedarray=0.18.0=r41hd029910_0
  - bioconductor-deseq2=1.32.0=r41h399db7b_0
  - bioconductor-genefilter=1.74.0=r41hba52eb8_0
  - bioconductor-geneplotter=1.70.0=r41hdfd78af_0
  - bioconductor-genomeinfodb=1.28.0=r41hdfd78af_0
  - bioconductor-genomeinfodbdata=1.2.7=r41hdfd78af_1
  - bioconductor-genomicranges=1.44.0=r41hd029910_0
  - bioconductor-iranges=2.26.0=r41hd029910_0
  - bioconductor-keggrest=1.32.0=r41hdfd78af_0
  - bioconductor-matrixgenerics=1.4.0=r41hdfd78af_0
  - bioconductor-s4vectors=0.30.0=r41hd029910_0
  - bioconductor-summarizedexperiment=1.22.0=r41hdfd78af_0
  - bioconductor-xvector=0.32.0=r41hd029910_0
  - bioconductor-zlibbioc=1.38.0=r41hd029910_0
  - bwidget=1.9.14=ha770c72_1
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2021.10.8=ha878542_0
  - cairo=1.16.0=ha61ee94_1011
  - curl=7.83.0=h2283fc2_0
  - expat=2.4.8=h27087fc_0
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=hab24e00_0
  - fontconfig=2.14.0=h8e229c2_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - freetype=2.10.4=h0708190_1
  - fribidi=1.0.10=h36c2ea0_0
  - gcc_impl_linux-64=10.3.0=hf2f2afa_16
  - gcc_linux-64=10.3.0=hc39de41_9
  - gettext=0.19.8.1=h73d1719_1008
  - gfortran_impl_linux-64=10.3.0=h73f4979_16
  - gfortran_linux-64=10.3.0=hb09a455_9
  - graphite2=1.3.13=h58526e2_1001
  - gsl=2.7=he838d99_0
  - gxx_impl_linux-64=10.3.0=hf2f2afa_16
  - gxx_linux-64=10.3.0=h2593f52_9
  - harfbuzz=4.2.1=hf9f4e7c_0
  - icu=70.1=h27087fc_0
  - jbig=2.1=h7f98852_2003
  - jpeg=9e=h166bdaf_1
  - kernel-headers_linux-64=2.6.32=he073ed8_15
  - keyutils=1.6.1=h166bdaf_0
  - krb5=1.19.3=h08a2579_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - lerc=3.0=h9c3ff4c_0
  - libblas=3.9.0=14_linux64_openblas
  - libcblas=3.9.0=14_linux64_openblas
  - libcurl=7.83.0=h2283fc2_0
  - libdeflate=1.10=h7f98852_0
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.4.2=h7f98852_5
  - libgcc-devel_linux-64=10.3.0=he6cfe16_16
  - libgcc-ng=11.2.0=h1d223b6_16
  - libgfortran-ng=11.2.0=h69a702a_16
  - libgfortran5=11.2.0=h5c6108e_16
  - libglib=2.70.2=h174f98d_4
  - libgomp=11.2.0=h1d223b6_16
  - libiconv=1.16=h516909a_0
  - liblapack=3.9.0=14_linux64_openblas
  - libnghttp2=1.47.0=he49606f_0
  - libopenblas=0.3.20=pthreads_h78a6416_0
  - libpng=1.6.37=h21135ba_2
  - libsanitizer=10.3.0=h26c7422_16
  - libssh2=1.10.0=ha35d2d1_2
  - libstdcxx-devel_linux-64=10.3.0=he6cfe16_16
  - libstdcxx-ng=11.2.0=he4da1e4_16
  - libtiff=4.3.0=h542a066_3
  - libuuid=2.32.1=h7f98852_1000
  - libwebp-base=1.2.2=h7f98852_1
  - libxcb=1.13=h7f98852_1004
  - libxml2=2.9.14=h22db469_0
  - libzlib=1.2.11=h166bdaf_1014
  - lz4-c=1.9.3=h9c3ff4c_1
  - make=4.3=hd18ef5c_1
  - ncurses=6.3=h27087fc_1
  - openssl=3.0.3=h166bdaf_0
  - pango=1.50.7=hbd2fdc8_0
  - pcre=8.45=h9c3ff4c_0
  - pcre2=10.37=h032f7d1_0
  - pixman=0.40.0=h36c2ea0_0
  - pthread-stubs=0.4=h36c2ea0_1001
  - r-askpass=1.1=r41hcfec24a_2
  - r-backports=1.4.1=r41hcfec24a_0
  - r-base=4.1.3=h06d3f91_1
  - r-bh=1.78.0_0=r41hc72bb7e_0
  - r-bit=4.0.4=r41hcfec24a_0
  - r-bit64=4.0.5=r41hcfec24a_0
  - r-bitops=1.0_7=r41hcfec24a_0
  - r-blob=1.2.3=r41hc72bb7e_0
  - r-brio=1.1.3=r41hcfec24a_0
  - r-cachem=1.0.6=r41hcfec24a_0
  - r-callr=3.7.0=r41hc72bb7e_0
  - r-cli=3.3.0=r41h7525677_0
  - r-codetools=0.2_18=r41hc72bb7e_0
  - r-colorspace=2.0_3=r41h06615bd_0
  - r-crayon=1.5.1=r41hc72bb7e_0
  - r-curl=4.3.2=r41hcfec24a_0
  - r-data.table=1.14.2=r41hcfec24a_0
  - r-dbi=1.1.2=r41hc72bb7e_0
  - r-desc=1.4.1=r41hc72bb7e_0
  - r-diffobj=0.3.5=r41hcfec24a_0
  - r-digest=0.6.29=r41h03ef668_0
  - r-doparallel=1.0.17=r41hc72bb7e_0
  - r-dplyr=1.0.9=r41h7525677_0
  - r-ellipsis=0.3.2=r41hcfec24a_0
  - r-evaluate=0.15=r41hc72bb7e_0
  - r-fansi=1.0.3=r41h06615bd_0
  - r-farver=2.1.0=r41h03ef668_0
  - r-fastmap=1.1.0=r41h03ef668_0
  - r-foreach=1.5.2=r41hc72bb7e_0
  - r-formatr=1.12=r41hc72bb7e_0
  - r-futile.logger=1.4.3=r41hc72bb7e_1003
  - r-futile.options=1.0.1=r41hc72bb7e_1002
  - r-generics=0.1.2=r41hc72bb7e_0
  - r-getopt=1.20.3=r41ha770c72_2
  - r-ggplot2=3.3.6=r41hc72bb7e_0
  - r-glue=1.6.2=r41h06615bd_0
  - r-gtable=0.3.0=r41hc72bb7e_3
  - r-httr=1.4.3=r41hc72bb7e_0
  - r-isoband=0.2.5=r41h03ef668_0
  - r-iterators=1.0.14=r41hc72bb7e_0
  - r-jsonlite=1.8.0=r41h06615bd_0
  - r-labeling=0.4.2=r41hc72bb7e_1
  - r-lambda.r=1.2.4=r41hc72bb7e_1
  - r-lattice=0.20_45=r41hcfec24a_0
  - r-lifecycle=1.0.1=r41hc72bb7e_0
  - r-locfit=1.5_9.5=r41h06615bd_0
  - r-magrittr=2.0.3=r41h06615bd_0
  - r-mass=7.3_57=r41h06615bd_0
  - r-matrix=1.4_1=r41h0154571_0
  - r-matrixstats=0.62.0=r41h06615bd_0
  - r-memoise=2.0.1=r41hc72bb7e_0
  - r-mgcv=1.8_40=r41h0154571_0
  - r-mime=0.12=r41hcfec24a_0
  - r-munsell=0.5.0=r41hc72bb7e_1004
  - r-nlme=3.1_157=r41h8da6f51_0
  - r-openssl=2.0.0=r41h1f3e0c5_0
  - r-optparse=1.7.1=r41hc72bb7e_0
  - r-pillar=1.7.0=r41hc72bb7e_0
  - r-pkgconfig=2.0.3=r41hc72bb7e_1
  - r-pkgload=1.2.4=r41h03ef668_0
  - r-plogr=0.2.0=r41hc72bb7e_1003
  - r-plyr=1.8.7=r41h7525677_0
  - r-png=0.1_7=r41hcfec24a_1004
  - r-praise=1.0.0=r41hc72bb7e_1005
  - r-processx=3.5.3=r41h06615bd_0
  - r-ps=1.7.0=r41h06615bd_0
  - r-purrr=0.3.4=r41hcfec24a_1
  - r-r6=2.5.1=r41hc72bb7e_0
  - r-rcolorbrewer=1.1_3=r41h785f33e_0
  - r-rcpp=1.0.8.3=r41h7525677_0
  - r-rcpparmadillo=0.11.0.0.0=r41h43535f1_0
  - r-rcurl=1.98_1.6=r41hcfec24a_0
  - r-rematch2=2.1.2=r41hc72bb7e_1
  - r-reshape2=1.4.4=r41h03ef668_1
  - r-rlang=1.0.2=r41h7525677_0
  - r-rprojroot=2.0.3=r41hc72bb7e_0
  - r-rsqlite=2.2.8=r41h03ef668_0
  - r-rstudioapi=0.13=r41hc72bb7e_0
  - r-scales=1.2.0=r41hc72bb7e_0
  - r-snow=0.4_4=r41hc72bb7e_0
  - r-stringi=1.7.6=r41h30a9eb7_2
  - r-stringr=1.4.0=r41hc72bb7e_2
  - r-survival=3.3_1=r41h06615bd_0
  - r-sys=3.4=r41hcfec24a_0
  - r-testthat=3.1.4=r41h7525677_0
  - r-tibble=3.1.7=r41h06615bd_0
  - r-tidyselect=1.1.2=r41hc72bb7e_0
  - r-utf8=1.2.2=r41hcfec24a_0
  - r-vctrs=0.4.1=r41h7525677_0
  - r-viridislite=0.4.0=r41hc72bb7e_0
  - r-waldo=0.4.0=r41hc72bb7e_0
  - r-withr=2.5.0=r41hc72bb7e_0
  - r-xml=3.99_0.9=r41h06615bd_0
  - r-xtable=1.8_4=r41hc72bb7e_3
  - readline=8.1=h46c0cb4_0
  - sed=4.8=he412f7d_0
  - sysroot_linux-64=2.12=he073ed8_15
  - tk=8.6.12=h27826a3_0
  - tktable=2.10=hb7b940f_3
  - xorg-kbproto=1.0.7=h7f98852_1002
  - xorg-libice=1.0.10=h7f98852_0
  - xorg-libsm=1.2.3=hd9c2040_1000
  - xorg-libx11=1.7.2=h7f98852_0
  - xorg-libxau=1.0.9=h7f98852_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xorg-libxext=1.3.4=h7f98852_1
  - xorg-libxrender=0.9.10=h7f98852_1003
  - xorg-libxt=1.2.1=h7f98852_2
  - xorg-renderproto=0.11.1=h7f98852_1002
  - xorg-xextproto=7.3.0=h7f98852_1002
  - xorg-xproto=7.0.31=h7f98852_1007
  - xz=5.2.5=h516909a_1
  - zlib=1.2.11=h166bdaf_1014
  - zstd=1.5.2=ha95c52a_0

modified script from Fabian Amman

this script calculates p values like the original paper but uses a negative binomial model with size factors and dispersion estimated with DESeq2 can include things such as batch

CRISPR-LICHT-pvalue-8July2022.R Click to expand!
library(DESeq2)
library(data.table)
library(dplyr)
library(optparse)
library(reshape2)
library(MASS)
library(doParallel)
library(foreach)

#we first register a backend specifying the number
# of cores that we wish to use
registerDoParallel(cores = 32)

# print session info
sessionInfo()

# define parameters
option_list = list(
    make_option(c("-f", "--file"), type="character", default="countMatrix.csv",
              help="path to input file holding the simulated cell lineage counts [default= %default]", metavar="character"),
    make_option(c("-o", "--out"), type="character", default="results_list.csv",
              help="path to output file holding the results of the diff. caling [default= %default]", metavar="character"),
    make_option(c("-b", "--bootstrap"), type="integer", default="10000",
              help="number of re-sampling during bootstrapping [default= %default]", metavar="character"),
    make_option(c("-l", "--minLCC"), type="integer", default="4",
              help="lower lineage cell count threshold to remove dead lineages [default= %default]", metavar="character"),
    make_option(c("-u", "--maxLCC"), type="integer", default="Inf",
              help="upper lineage cell count threshold to remove dead lineages [default= %default]", metavar="character"),
    make_option(c("-s", "--nTstring"), type="character", default="cntrl",
              help="regex string to match non-tageting controls (gene names) [default= %default]", metavar="character"),
    make_option(c("-w", "--winzor"), type="double", default="0.1",
              help="quantile remove prior to get trimmed means [default= %default]", metavar="character")
);

opt_parser = OptionParser(option_list=option_list, add_help_option=TRUE);
opt = parse_args(opt_parser);
if(is.na(opt$maxLCC)){
  opt$maxLCC <- Inf
}
print(opt)


R          = opt$bootstrap   # number of repeated bootstrap sampling steps
minLCC     = opt$minLCC      # minimal number of cells per lineage to be considererd (e.g.: 4)
maxLCC     = opt$maxLCC      # maximal number of cells per lineage to be considererd (e.g.: 17000)
nTstring   = opt$nTstring    # string specifying non-targeting gRNA data in count matrix
winzorizer = opt$winzor      # remove top and bottom percent to take means (makes it more robust against outlier)

# define functions
fishers_method <- function(x) return(pchisq(-2*sum(log(x)),df = 2*length(x),lower.tail=FALSE))

# read in count matrix
test <- read.table(opt$file,header=T, row.names="Gene") #read in the empirical data

test3 <- test[,colSums(test) > minLCC & colSums(test) < maxLCC ]

colnames(test3) <- as.integer(as.factor(colnames(test3)))

cts <- as.matrix(test3) # convert data to matrix

coldata <- data.frame("condition"=rep(1,length(test3[1,]))) # setup coldata

row.names(coldata) <- names(test3[1,]) # setup row names for column data

coldata$condition <- factor(coldata$condition) # make condition a factor

dds <- DESeqDataSetFromMatrix(countData = cts, # make a DESeq2 data set, a single condition
                              colData = coldata,
                              design = ~ 1)

dds <- estimateSizeFactors(dds, type="poscounts") # estimate size factors with DESeq2
dds <- estimateDispersions(dds) # estimate final dispersion values for all genes
dispersion <- as.numeric(paste(dds@dispersionFunction(1)))


row.data <- as.data.frame(rowData(dds)) # extract the DESeq2 dispersion and mean values

test <- read.table(opt$file,header=T) 
test2 <- melt(test,id.vars="Gene") # convert from wide to long format
dta1 <- data.frame("gene.id"=gsub("_\\d+$","",test2[,1],perl=T), # convert ACTG1_AJ_1 to ACTG1_AJ (assumes GeneName_effect_gRNAnumber) # in real(our) data, effect would not be known or included
                    "grna.id"=test2[,1], # adds ACTG1_AJ_1 for example
                    "lid.id"=as.integer(test2[,2]), # converts the lineage*sample barcode (20-bp) to an integer such as 1,2,3,etc
                    "cell.count"=test2[,3], # add the cell count data
                    "batch"=rep(1,length(test2[,1])))

# remove lineages with less/more, min/max cellcount-threshold
dta <- subset(dta1, cell.count > minLCC & cell.count < maxLCC)

# get list of genes from count matrix (all but the non-targeting gRNA)
#genes <- grep(nTstring, unique(dta$gene.id), invert = TRUE, value = TRUE)
genes <- unique(dta$gene.id)

# get a list of non-targeting gRNA
genes2 <- grep(nTstring, unique(dta$gene.id), invert = FALSE, value = TRUE)

# make dummy data.frame
results <- data.frame(gene = character(), logFC = double(), pval = double(), pvals = character(), stringsAsFactors=FALSE)

# loop over all genes
results <- foreach (gene.idx=1:length(genes), .combine=rbind) %dopar% {
    # get gene name
    gene <- genes[gene.idx]

    # subset data to current gene and controls
    gene.data <- dta[grepl(paste0(nTstring,"|",gene), dta$gene.id),]

    # get gRNAs names for current gene
    if( length(grep(nTstring, gene)) > 0){
        gRNAs <- grep(gene, unique(gene.data$grna.id), invert = FALSE, value = TRUE)    
    }else {
        gRNAs <- grep(nTstring, unique(gene.data$grna.id), invert = TRUE, value = TRUE)
    }
    # make dummy data.frames
    lowerTail.pvals <- rep(NA, length(gRNAs))
    logFCs          <- rep(NA, length(gRNAs))

    # loop over all gRNA of that gene
    for (grna.idx in 1:length(gRNAs)){

        # get gRNA name
        gRNA <- gRNAs[grna.idx]

        # subset data to current gRNA and controls
        gRNA.data <- gene.data[grepl(paste0(nTstring,"|",gRNA), gene.data$grna.id),]

        # get number of observed lineages of this gRNA)
        lineage.count <- length(gRNA.data[gRNA.data$grna.id == gRNA,]$lid.id)

        # lineages

        mean <- mean(gRNA.data[gRNA.data$grna.id == gRNA,]$cell.count, na.rm = FALSE, trim = winzorizer)

        # sample R-1 time lineage.count lineages from the non-targeting background lineages and determine their means
        means <- rep(NA, R-1)
        if (length(gRNA.data[grepl(nTstring, gRNA.data$grna.id),]$cell.count) < lineage.count){
            print("lineage.count > nTstring cell count, setting lineage.count = nTstring cell count")
            lineage.count = length(gRNA.data[grepl(nTstring, gRNA.data$grna.id),]$cell.count)
        }
        c = 0
        repeat{
            c = c + 1
            means[c] <- mean(sample(gRNA.data[grepl(nTstring, gRNA.data$grna.id),]$cell.count, lineage.count), na.rm = FALSE, trim = winzorizer)
            if (c == (R - 1)){
    	         break
    	    }
        }
        
        # determine empirical p-value by using negative binomial generalized linear model

        # make a new data.frame to add column saying if just the controls are Control:yes
        gRNAcontrol.data <- gRNA.data[grepl(nTstring, gRNA.data$grna.id),]
        gRNAcontrol.data$Control <- rep("yes",length(gRNAcontrol.data$grna.id))

        # make a new data.frame to add a column saying if just the gRNA is a Control:no
        gRNAobserved.data <- gRNA.data[grepl(gRNA, gRNA.data$grna.id),]
        gRNAobserved.data$Control <- rep("no",length(gRNAobserved.data$grna.id))

        # combine the two data.frames from the previous steps
        gRNA2.data <- rbind(gRNAobserved.data, gRNAcontrol.data)

        # make these things as factors
        gRNA2.data$lid.id <- as.factor(gRNA2.data$lid.id)
        gRNA2.data$Control <- as.factor(gRNA2.data$Control)
        gRNA2.data$batch <- as.factor(gRNA2.data$batch)

        # add on size factors from DESeq2
        size.factors <- rep('NA',length(gRNA2.data$lid.id))
        for (size.factor in 1:length(gRNA2.data$lid.id)){
            size.factors[size.factor] <- sizeFactors(dds)[gRNA2.data$lid.id[size.factor]]
        }
        gRNA2.data$size.factors <- as.numeric(size.factors)

        # now do a generalized linear model with a negative binomial function
        # and return the pvalue of the Control coefficient

        # if more than one batch then add batch to the model
        lowerTail.pvals[grna.idx] <- tryCatch(as.numeric(paste(coef(summary(glm(cell.count ~ Control + log(size.factors), family = negative.binomial(dispersion), data=gRNA2.data)))[,4][2])), error=function(err) 1)

        # calculate log2 Fold Change from bootstrapped control means and gRNA mean
        if(mean != 0){
            if(mean(means, na.rm = FALSE) != 0){
                logFCs[grna.idx] <- log2(mean/mean(means, na.rm = FALSE))
            } else{
                logFCs[grna.idx] <- log2(mean/1)
            }
        } else{
            if(mean(means, na.rm = FALSE) != 0){
                logFCs[grna.idx] <- log2(1/mean(means, na.rm = FALSE))
            } else{
                logFCs[grna.idx] <- log2(1/1)
            }
        }
        print(paste(gRNA, "pvals:", signif(lowerTail.pvals[grna.idx], digits=2), "log2FC:", signif(logFCs[grna.idx], digits=2)));
    }
    # merge pvalues accross all gRNA of a gene using fisher's method
    lowerTail.pval <- fishers_method(lowerTail.pvals)

    # average logFC from different gRNA weighted by 1-pvalue
    lowerTail.logFC <- mean(logFCs, na.rm = FALSE)
    lowerTail.logFC <- ifelse(is.na(lowerTail.logFC), 0, lowerTail.logFC)

    # dump results into df
    dummyLineLowerTail <- data.frame(gene = gene, logFC = lowerTail.logFC, pval = lowerTail.pval, pvals = paste(signif(lowerTail.pvals, digits = 2), collapse="|"), stringsAsFactors=FALSE)

    # this part is needed for the parallel foreach loop
    rbind(dummyLineLowerTail)
}

# correct p-values for multiple testing
results$padj <- p.adjust(results$pval, method = "BH")

# sort by gene name
results <- results[order(results$gene),]

# write table to file
fwrite(results, opt$out, row.names = FALSE, col.names = TRUE, sep = "\t")```

Use Snakemake to run pipeline to generate count table

Snakefile-fasta3 Click to expand!
#
shell.executable("/bin/bash")

# how many parts should the FASTQ be split into
SPLIT=16
scattergather:
    split=SPLIT

THREADS=24
MEM=64

# set input file name here
IDS, = glob_wildcards("{id}-orig-I1-I2-R1.fastq.gz")

rule all:
    input:
        raw = expand("{id}-raw.csv", id = IDS),
        filtered = expand("{id}-filtered.csv", id = IDS)

# white list
rule whitelist:
    input: "{id}-orig-I1-I2-R1.fastq.gz"
    output:"{id}-whitelist.txt"
    log:
        stdout = "{id}-whitelist.out",
        stderr = "{id}-whitelist.err"
    params: "{id}-whitelist-I1"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load pyston/2.3.4
        module load seqtk
        module load bcftools/1.14
        bgzip -cd -@4 {input} | \
        umi_tools whitelist --extract-method=regex \
        --bc-pattern="(?P<cell_1>.{{10}})(?P<cell_2>.{{10}})(?P<umi_1>.{{12}})" \
        --error-correct-threshold 3 \
        --subset-reads=1000000000 \
        --plot-prefix {params} \
        --random-seed=1 \
        --log={log.stdout} \
        --error={log.stderr} \
        --stdout={output}
        '''

# split input FASTQ into 4 parts for faster processing
rule split_fastq:
    input: "{id}-orig-I1-I2-R1.fastq.gz"
    output: scatter.split("{{id}}-orig-I1-I2-R1-{scatteritem}.fastq.gz")
    params: "{id}-orig-I1-I2-R1-%-of-" + str(SPLIT) + ".fastq.gz"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load bbtools/38.82
        module load bcftools/1.14
        partition.sh in={input} ways={SPLIT} out={params}
        for i in `seq 1 {SPLIT}|sort -gr`
        do
          j=$(($i-1))
          mv {wildcards.id}-orig-I1-I2-R1-$j-of-{SPLIT}.fastq.gz {wildcards.id}-orig-I1-I2-R1-$i-of-{SPLIT}.fastq.gz
        done
        '''

# extract/ error-correct barcodes, gRNAs
rule extract:
    input: 
        reads = "{id}-orig-I1-I2-R1-{scatteritem}.fastq.gz",
        whitelist = "{id}-whitelist.txt"
    output: "{id}-extract-{scatteritem}.fasta.gz"
    log:
        stdout = "{id}-extract-{scatteritem}.out",
        stderr = "{id}-extract-{scatteritem}.err"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load pyston/2.3.4
        module load seqtk
        module load bcftools/1.14
        module load openblas/0.3.20
        bgzip -cd -@4 {input.reads} | \
        umi_tools extract --extract-method=regex \
        --bc-pattern="(?P<cell_1>.{{10}})(?P<cell_2>.{{10}})(?P<umi_1>.{{12}})" \
        --error-correct-cell \
        --whitelist {input.whitelist} \
        --log={log.stdout} \
        --error={log.stderr} |seqtk seq -A |bgzip -c -@ 6 > {output}
        '''

## gather splits
rule gather_fasta:
    input:
        gather.split("{{id}}-extract-{scatteritem}.fasta.gz")
    output:
        "{id}-extract.fasta.gz"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load bcftools/1.14
        cat {input} > {output}
        '''

## make input for STAR
### step1 
rule STAR_input1:
    input: "{id}-extract.fasta.gz"
    output: "{id}-corr1-I1-I2-R1.names.fastq.gz"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load bcftools/1.14
        module load seqtk
        module load bbtools/38.82
        bgzip -cd -@4 {input} | \
        paste - - | \
        fgrep "AGGACGAAACACCG" | \
        perl -pe "s/>(\\S+)_(\\w+)_(\\w+)\\t(\\w+)/>\\1\\n\\4\\2\\3/" | \
        bbduk.sh int=f literal=AGGACGAAACACCG k=14 ktrim=l in=STDIN.fa out=STDOUT.fa threads=4 | \
        perl -pe "s/^([A-Z]+)([A-Z]{{20,20}})([A-Z]{{12,12}})$/\\3\\2\\1/" | \
        cut -c-52 | \
        seqtk seq -F ? - | \
        bgzip -c -@ 4 > {output}
        '''

## step2
rule STAR_input2:
    input: "{id}-corr1-I1-I2-R1.names.fastq.gz"
    output: "{id}-barcodes.txt"
    shell:
        """
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load bcftools/1.14
        module load seqtk
        bgzip -cd -@4 {input} | \
        seqtk trimfq -b 12 - | paste - - - - | cut -f 2 | cut -c-20 | \
        sort --parallel={THREADS} -T ./ -S {MEM}G | uniq -c | \
        perl -pe 's/( )+/\t/g'|cut -f 2-3 |awk '$1 >= 100' |cut -f 2 > {output}
        """

# STAR1
rule STAR1:
    input:
        reads = "{id}-corr1-I1-I2-R1.names.fastq.gz",
        barcodes = "{id}-barcodes.txt"
    output:
        map = directory("{id}-STAR/"),
        tmp = temp(directory("{id}-STAR-temp/")),
        bam = "{id}-STAR/Aligned.out.bam"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        /nfs/scistore16/itgrp/jelbers/git/STAR/source/STAR
        echo
        /nfs/scistore16/itgrp/jelbers/git/STAR/source/STAR \
        --outFileNamePrefix {output.map}/ \
        --outTmpDir {output.tmp}/ \
        --readFilesCommand zcat \
        --genomeDir genome-star-idx \
        --outSAMattributes All \
        --outSAMtype BAM Unsorted \
        --runThreadN {THREADS} \
        --readFilesIn {input.reads} \
        --soloStrand Unstranded \
        --soloType CB_UMI_Simple \
        --alignIntronMax 1 \
        --soloFeatures Gene \
        --soloUMIdedup 1MM_All \
        --soloCBstart 13 \
        --soloCBlen 20 \
        --soloUMIstart 1 \
        --soloUMIlen 12 \
        --soloBarcodeMate 1 \
        --clip5pNbases 32 \
        --soloCBwhitelist {input.barcodes} \
        --soloCBmatchWLtype Exact
        '''

# filterBAM
rule filterBAM:
    input: "{id}-STAR/Aligned.out.bam"
    output: "{id}-STAR/Aligned.out2.bam"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load samtools/1.14
        LC_ALL=C
        samtools view -h -@{THREADS} -q 255 {input} | \
        fgrep -f <(echo -e "@\\nMD:Z:20\\nMD:Z:19") - | \
        samtools view -@{THREADS} -Sb > {output}
        '''

# bam2fastq
rule BAM2FASTQ:
    input: "{id}-STAR/Aligned.out2.bam"
    output: "{id}-corr2-I1-I2-R1.names.fastq.gz"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load samtools/1.14
        module load compression-tools
        samtools fastq -@{THREADS} {input} | \
        pigz -p {THREADS} > {output}
        '''

# STAR2
rule STAR2:
    input:
        reads = "{id}-corr2-I1-I2-R1.names.fastq.gz",
        barcodes = "{id}-barcodes.txt"
    output:
        map = directory("{id}-STAR2/"),
        tmp = temp(directory("{id}-STAR2-temp/")),
        matrix = "{id}-STAR2/Solo.out/Gene/raw/matrix.mtx",
        features = "{id}-STAR2/Solo.out/Gene/raw/features.tsv",
        barcodes = "{id}-STAR2/Solo.out/Gene/raw/barcodes.tsv",
        matrix2 = "{id}-STAR2/Solo.out/Gene/filtered/matrix.mtx",
        features2 = "{id}-STAR2/Solo.out/Gene/filtered/features.tsv",
        barcodes2 = "{id}-STAR2/Solo.out/Gene/filtered/barcodes.tsv"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        /nfs/scistore16/itgrp/jelbers/git/STAR/source/STAR
        echo
        /nfs/scistore16/itgrp/jelbers/git/STAR/source/STAR \
        --outFileNamePrefix {output.map}/ \
        --outTmpDir {output.tmp}/ \
        --readFilesCommand zcat \
        --genomeDir genome-star-idx \
        --outSAMattributes All \
        --outSAMtype BAM Unsorted \
        --runThreadN {THREADS} \
        --readFilesIn {input.reads} \
        --soloStrand Unstranded \
        --soloType CB_UMI_Simple \
        --alignIntronMax 1 \
        --soloFeatures Gene \
        --soloUMIdedup 1MM_All \
        --soloCBstart 13 \
        --soloCBlen 20 \
        --soloUMIstart 1 \
        --soloUMIlen 12 \
        --soloBarcodeMate 1 \
        --clip5pNbases 32 \
        --soloCBwhitelist {input.barcodes} \
        --soloCBmatchWLtype Exact
        '''

# matrix to csv
rule mat2csv:
    input:
        matrix = "{id}-STAR2/Solo.out/Gene/raw/matrix.mtx",
        features = "{id}-STAR2/Solo.out/Gene/raw/features.tsv",
        barcodes = "{id}-STAR2/Solo.out/Gene/raw/barcodes.tsv"
    output: 
        csv = "{id}-raw.csv",
        matrix = "{id}-STAR2/Solo.out/Gene/raw/matrix.mtx.gz",
        features = "{id}-STAR2/Solo.out/Gene/raw/features.tsv.gz",
        barcodes = "{id}-STAR2/Solo.out/Gene/raw/barcodes.tsv.gz"
    params: "{id}-STAR2"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load cellranger
        module load compression-tools
        pigz -p {THREADS} -fk {input.matrix}
        pigz -p {THREADS} -fk {input.features}
        pigz -p {THREADS} -fk {input.barcodes}
        cellranger mat2csv {params}/Solo.out/Gene/raw/ {output.csv}
        '''

# matrix to csv
rule mat2csv2:
    input:
        matrix = "{id}-STAR2/Solo.out/Gene/filtered/matrix.mtx",
        features = "{id}-STAR2/Solo.out/Gene/filtered/features.tsv",
        barcodes = "{id}-STAR2/Solo.out/Gene/filtered/barcodes.tsv"
    output: 
        csv = "{id}-filtered.csv",
        matrix = "{id}-STAR2/Solo.out/Gene/filtered/matrix.mtx.gz",
        features = "{id}-STAR2/Solo.out/Gene/filtered/features.tsv.gz",
        barcodes = "{id}-STAR2/Solo.out/Gene/filtered/barcodes.tsv.gz"
    params: "{id}-STAR2"
    shell:
        '''
        hostname
        unset SLURM_EXPORT_ENV
        module purge
        module load cellranger
        module load compression-tools
        pigz -p {THREADS} -fk {input.matrix}
        pigz -p {THREADS} -fk {input.features}
        pigz -p {THREADS} -fk {input.barcodes}
        cellranger mat2csv {params}/Solo.out/Gene/filtered/ {output.csv}
        '''

make a directed acyclic graph for a diagram of the workflow

conda activate snakemake-7.8.2
snakemake --snakefile Snakefile-fasta3 --dryrun  --dag all \
> CRISPR-LICHT.dag
CRISPR-LICHT.dag3 Click to expand!
digraph snakemake_dag {
    graph[bgcolor=white, margin=0];
    node[shape=box, style=rounded, fontname=sans,                 fontsize=10, penwidth=2];
    edge[penwidth=2, color=grey];
	0[label = "ready to estimate log2FC and\np values from count table\nusing a negative\nbinomial generlized linear\nmodel with a common dispersion\nestimate and ability to add\n covariates such as batch\njust like DESeq2", color = "0.36 0.6 0.85", style="rounded,dashed"];
	2[label = "use STAR on the filtered reads\n to make count table", color = "0.10 0.6 0.85", style="rounded,dashed"];
	3[label = "convert filtered BAM to filtered reads", color = "0.62 0.6 0.85", style="rounded,dashed"];
	4[label = "filter BAM output from STAR to \nget rid of mismatches", color = "0.46 0.6 0.85", style="rounded,dashed"];
	5[label = "use STAR to estimate mismatches\nbetween reads and gRNAs\nno introns allowed", color = "0.00 0.6 0.85", style="rounded,dashed"];
	6[label = "convert to format for STAR", color = "0.31 0.6 0.85", style="rounded,dashed"];
	7[label = "combine 16 outputs from \numi_tools extract", color = "0.21 0.6 0.85", style="rounded,dashed"];
	8[label = "use umi_tools extract on each \nof the 16 input files\n and the whitelist\nto correct barcodes", color = "0.41 0.6 0.85", style="rounded,dashed"];
	9[label = "split the original files\ninto 16 pieces for\n16x faster processing", color = "0.26 0.6 0.85", style="rounded,dashed"];
	10[label = "use umi_tools whitelist\non original reads\nwith up to 3 errors\nin combined 20-bp\nsample*lineage barcodes\nto make a whitelist", color = "0.51 0.6 0.85", style="rounded,dashed"];
	11[label = "make barcode list for STAR", color = "0.05 0.6 0.85", style="rounded,dashed"];
	12[label = "convert count table from matrix\nformat to csv format", color = "0.15 0.6 0.85", style="rounded,dashed"];
	13[label = "get original reads\nIndex1: 10-bp lineage barcode\nIndex2: 10-bp sample barcode\nR1 (51-bp total):\n12-bp UMI,\nsome random seq,\na constant seq,\nimmediately followed by\nthe 19-20-bp gRNA", color = "0.15 0.6 0.85", style="rounded,dashed"];
	14[label = "make a sequence reference for \nSTAR using the gRNAs", color = "0.15 0.6 0.85", style="rounded,dashed"];
	12 -> 0
	3 -> 2
	11 -> 2
	4 -> 3
	5 -> 4
	6 -> 5
	11 -> 5
	7 -> 6
	8 -> 7
	9 -> 8
	10 -> 8
	6 -> 11
	2 -> 12
	13 -> 9
	13 -> 10
	14 -> 2
	14 -> 5
}

View the workflow in a diagram form

cat CRISPR-LICHT.dag3 | \
dot -Tsvg

workflow1

run the workflow

snakemake -j 30 --snakefile Snakefile-fasta3 \
--printshellcmds --latency-wait 60 --local-cores 4 --cores all \
--cluster "sbatch --export=NONE --no-requeue --job-name {rule} --mem=64g \
--time=2:00:00 --cpus-per-task=12 --constraint=avx2 --partition=gpu" all > Snakefile.log 2>&1 &

estimate p values and log2FC with all non-targeting controls gRNAs and no tomato

CRISPR-LICHT-8July2022raw.slurm

#!/bin/bash
#SBATCH --job-name=raw
#SBATCH --mem=32000
#SBATCH -c 32
#SBATCH --time=8:00:00
#SBATCH --ntasks=1

unset SLURM_EXPORT_ENV
module purge
. "/nfs/scistore16/itgrp/jelbers/miniconda3/etc/profile.d/conda.sh"
conda activate crispr-licht-04May2022

cat SRR11868220-raw.csv |tr ',' '\t' |perl -pe "s/^\t/Gene\t/g" |fgrep -v "tomato" > SRR11868220-raw.notomato.tsv

Rscript CRISPR-LICHT-8July2022.R --out CRISPR-LICHT-8July2022nC.SRR11868220-raw.notomato.tsv.pvalues.tsv \
--nTstring nC --file SRR11868220-raw.notomato.tsv \
> CRISPR-LICHT-8July2022nC.SRR11868220-raw.notomato.tsv.pvalues.log 2>&1

submit slurm job

sbatch CRISPR-LICHT-8July2022raw.slurm

graph raw p values

grep -Po "\S+\spvals:\s\S+\slog2FC:\s+\S+" \
CRISPR-LICHT-8July2022nC.SRR11868220-raw.notomato.tsv.pvalues.log| \
perl -pe "s/\"(\S+)\spvals:\s(\S+)\slog2FC:\s(\S+)\"/\1\t\2\t\3/g"|sort |sed '1i gRNA\tpvals\tlog2FC' > all-non-targeting-gRNAs.tsv
all-non-targeting-gRNAs.tsv Click to expand!
gRNA                   pvals    log2FC
ACTG1_AJ_1             0.36     -1.8
ACTG1_AJ_3             0.71     -0.4
ACTG1_AJ_4             0.32     -4.8
AKT3_LOE3_2            0.041    -3.3
AKT3_LOE3_3            0.31     -0.041
ALG12_LOE3_2           0.041    -4.6
ALG12_LOE3_3           0.51     -1.9
ALG12_LOE3_4           0.14     -5
ALG1_LOE2_1            0.5      0.71
ALG1_LOE2_4            0.41     -2.6
ALG3_LOE3_1            0.41     -5
ALG3_LOE3_2            0.16     -4.4
ALG3_LOE3_4            0.41     -5.1
ANKRD11_AJ_1           0.18     -5.1
ANKRD11_AJ_2           0.58     -3
ANKRD11_AJ_3           0.15     -3.2
ANKRD11_AJ_4           0.41     -5.2
ARFGEF2_LOE2_1         0.14     -5.7
ARFGEF2_LOE2_3         0.42     -5.9
ARFGEF2_LOE2_4         0.37     -2.8
ARNT2_LOE3_1           0.1      -2.9
ASNS_LOE3_1            0.4      1.8
ASNS_LOE3_4            0.44     -5.9
ASPM_LOE1_4            0.35     -3.4
ASXL3_LOE3_1           0.45     -5.9
ASXL3_LOE3_2           0.5      -2.3
ASXL3_LOE3_4           0.66     -3.7
ATM_LOE1_3             0.073    -3.8
ATM_LOE1_4             0.72     -0.49
ATRIP_AJ_1             0.31     -4.6
ATRIP_AJ_2             0.2      -3.9
ATRIP_AJ_3             0.37     -3.8
ATRIP_AJ_4             0.22     -4.5
ATR_LOE2_1             0.45     -5.9
ATR_LOE2_2             0.54     -2.5
ATR_LOE2_4             0.45     1.3
ATRX_AJ_2              0.41     -5.6
ATRX_AJ_3              0.45     -5.8
ATRX_AJ_4              0.19     -5
BCOR_LOE2_2            0.41     -5.4
BCOR_LOE2_3            0.28     -3.1
BCOR_LOE2_4            0.68     -0.27
BLM_AJ_1               0.074    -5.5
BLM_AJ_4               0.37     -4.7
BRF1_LOE2_1            0.62     -5.8
CASC5_LOE3_2           0.62     -5.9
CASK_LOE1_1            0.75     -3.2
CASK_LOE1_4            0.58     -0.93
CCDC88A_LOE3_1         0.62     -5.4
CCDC88A_LOE3_2         0.28     -5
CDC45_AJ_2             0.67     -4.6
CDC45_AJ_3             0.22     -3.6
CDC45_AJ_4             0.43     -5.6
CDC6_LOE3_1            0.08     4.4
CDC6_LOE3_2            0.44     -5.8
CDC6_LOE3_3            0.11     -4
CDK5RAP2_LOE1_1        0.25     -3.3
CDK5RAP2_LOE1_2        0.62     -5.1
CDK5RAP2_LOE1_4        0.39     -5.2
CDK6_LOE2_1            0.092    -4.6
CDKL5_AJ_1             0.6      -5.9
CDKL5_AJ_2             0.42     -2.7
CDKL5_AJ_3             0.12     2.8
CDKL5_AJ_4             0.035    -4.2
CDT1_LOE3_1            0.14     -4.3
CDT1_LOE3_4            0.29     -5.4
CENPE_LOE2_2           0.64     -2.1
CENPE_LOE2_3           0.3      -5.1
CENPE_LOE2_4           0.36     -5.8
CENPF_LOE3_3           0.44     0.64
CENPF_LOE3_4           0.45     -5.8
CENPJ_LOE2_2           0.38     -5.4
CENPJ_LOE2_3           0.41     1.2
CENPJ_LOE2_4           0.41     -2.4
CEP135_LOE2_2          0.9      -0.63
CEP135_LOE2_3          0.45     -5.8
CEP152_LOE2_3          0.38     -2.9
CEP63_LOE3_1           0.45     -5.9
CEP63_LOE3_2           0.72     -3.3
CEP63_LOE3_3           0.38     -5
CEP63_LOE3_4           0.41     -1.1
CIT_LOE3_2             0.39     -3.3
CIT_LOE3_3             0.94     -3.5
CIT_LOE3_4             0.41     -5.6
COG7_LOE3_1            0.44     -5.8
COG7_LOE3_2            0.26     -5.7
COG7_LOE3_3            0.71     -0.32
CREBBP_LOE3_1          0.39     -3.4
CREBBP_LOE3_2          0.11     -2.4
CREBBP_LOE3_3          0.39     -3.7
CREBBP_LOE3_4          0.39     -3.9
CSNK2A1_LOE3_2         0.38     -4.9
CTCF_LOE3_1            0.6      -4.5
CTNNB1_LOE3_1          0.26     -4.3
CTNNB1_LOE3_3          0.43     -5.6
CTNNB1_LOE3_4          0.14     -3.6
DDB2_LOE1_1            0.44     -5.9
DDB2_LOE1_2            0.33     -4.3
DDB2_LOE1_3            0.42     -5.8
DDB2_LOE1_4            0.16     -3.1
DHCR7_LOE3_1           0.6      -5.1
DHCR7_LOE3_3           0.42     -5.6
DHCR7_LOE3_4           0.39     -4.4
DKC1_LOE3_1            0.3      -2.2
DNA2_AJ_2              0.2      -4.6
DNM1L_LOE3_1           0.46     -5.7
DNM1L_LOE3_2           0.63     -1.2
DNM1L_LOE3_4           0.43     -2.7
DONSON_AJ_1            0.41     -3.2
DONSON_AJ_2            1        0.44
DONSON_AJ_4            0.31     -5.1
DPAGT1_LOE3_1          0.42     -5.8
DPAGT1_LOE3_2          0.11     -4.9
DPAGT1_LOE3_4          0.32     -5.5
DPM1_LOE3_1            0.19     -3.5
DPM1_LOE3_2            0.14     -3.4
DPM1_LOE3_3            0.45     -5.9
DPM1_LOE3_4            0.31     -2.1
DPP6_AJ_2              0.16     -3.2
DPP6_AJ_4              0.39     0.92
DYM_LOE3_1             0.12     -4.3
DYM_LOE3_3             0.81     0.96
DYNC1H1_LOE3_4         0.43     -5.6
DYRK1A_LOE1_3          0.31     -5.3
EEF1A2_LOE3_3          0.61     -1.2
EEF1A2_LOE3_4          0.41     -5.6
EFTUD2_LOE3_1          0.16     -3.3
EFTUD2_LOE3_2          0.42     -3.2
EIF2AK3_LOE3_2         0.43     -3
EIF2AK3_LOE3_3         0.75     -0.72
EIF2AK3_LOE3_4         0.41     0.15
EIF6_dC_1              0.18     -4.3
EIF6_dC_2              0.092    -4.9
EOMES_LOE3_1           0.8      -0.15
EOMES_LOE3_2           0.55     1.5
EOMES_LOE3_4           0.45     -1.8
EP300_LOE3_2           0.4      -5.3
EP300_LOE3_3           0.38     -4.5
EP300_LOE3_4           0.41     -5.6
ERCC1_LOE1_1           0.26     -5.7
ERCC1_LOE1_2           0.4      -4.7
ERCC1_LOE1_3           0.11     -4.4
ERCC1_LOE1_4           0.73     1.1
ERCC2_LOE1_1           0.31     -2.3
ERCC2_LOE1_2           0.4      -5
ERCC3_LOE1_1           0.43     -5.9
ERCC4_LOE1_1           0.38     -4.4
ERCC4_LOE1_2           0.36     1.3
ERCC5_LOE1_1           0.99     1.3
ERCC5_LOE1_3           0.64     -5.8
ERCC6_LOE2_1           0.82     -2.6
ERCC6_LOE2_3           0.35     -4.3
ERCC6_LOE2_4           0.38     -5
ERCC8_LOE1_1           0.43     -5.6
ERCC8_LOE1_3           0.41     -5
ESCO2_AJ_3             0.39     -4.9
ESCO2_AJ_4             0.64     -5.8
FGFR1_LOE2_1           0.32     -4.5
FGFR1_LOE2_2           0.27     2.3
FGFR1_LOE2_4           0.61     0.25
FOXG1_LOE2_3           0.024    2.3
GAPDH_dC_2             0.21     -4.4
GMNN_AJ_2              0.33     1.3
GMNN_AJ_4              0.6      -5
HCCS_LOE2_2            0.13     -3
HCCS_LOE2_3            0.16     -4
HDAC8_AJ_2             0.6      -5.9
HDAC8_AJ_4             0.31     -4.6
HMGA2_AJ_2             0.29     -4.8
HMGA2_AJ_3             0.52     -3.2
HMGA2_AJ_4             0.043    -3
h_non-targeting_nC_10  0.5      1.4
h_non-targeting_nC_2   0.56     -1.2
h_non-targeting_nC_3   0.33     -5.4
h_non-targeting_nC_4   0.46     -2
h_non-targeting_nC_6   0.4      -1.8
h_non-targeting_nC_9   0.61     -5.3
hrrm1_dC_1             0.64     -1.2
hrrm1_dC_2             0.13     -5.3
hrrm1_dC_4             0.42     1.6
IER3IP1_LOE3_2         0.45     -5.9
IER3IP1_LOE3_3         0.38     -3.3
IER3IP1_LOE3_4         1.5e-19  1.2
IGF1_AJ_1              0.43     -5.6
IGF1_AJ_2              0.91     -2.1
IGF1R_AJ_2             0.53     -5.7
IGF2_AJ_1              0.23     -4.4
IGF2_AJ_2              0.44     2.7
IGF2_AJ_4              0.18     4.4
IQSEC2_LOE3_1          0.12     -3.3
IQSEC2_LOE3_2          0.17     -2
IQSEC2_LOE3_3          0.45     -5.9
KAT6A_LOE2_2           0.68     -4.4
KAT6A_LOE2_4           0.44     -5.9
KAT6B_AJ_1             0.2      -4.4
KAT6B_AJ_2             0.077    -3.4
KAT6B_AJ_4             0.84     -0.42
KATNB1_LOE2_1          0.096    3
KATNB1_LOE2_2          0.43     1.2
KCNJ6_LOE3_2           0.54     -1.9
KCNJ6_LOE3_3           0.068    -3.5
KDM5C_LOE3_1           0.29     2
KDM5C_LOE3_2           0.73     0.71
KDM5C_LOE3_3           0.71     -3.6
KIF11_LOE2_1           0.21     -3.2
KIF11_LOE2_2           0.17     -5.3
KIF11_LOE2_3           0.55     0.98
KIF14_LOE3_1           0.43     -5.6
KIF14_LOE3_3           0.44     -5.8
KIF5C_LOE3_1           0.75     0.2
KIF5C_LOE3_2           0.45     -5.8
KIF5C_LOE3_4           0.21     -5.6
KMT2A_AJ_2             0.21     -4.2
KMT2A_AJ_3             0.66     -0.16
LARP7_AJ_1             0.97     -1.8
LARP7_AJ_3             0.58     -5.6
LARP7_AJ_4             0.39     -4.2
LIG4_LOE1_1            0.45     -5.9
LIG4_LOE1_2            0.28     -5.2
MCM4_AJ_1              0.32     -5
MCPH1_LOE1_1           0.16     -3.2
MCPH1_LOE1_2           0.58     -3.4
MCPH1_LOE1_3           0.43     -5.6
MCPH1_LOE1_4           0.34     -4.6
MECP2_LOE1_1           0.35     -2
MECP2_LOE1_2           0.64     -1.3
MECP2_LOE1_3           0.31     -5.3
MECP2_LOE1_4           0.4      -4.8
MED12_AJ_1             0.16     -2.8
MED12_AJ_3             0.92     -1.3
MED12_AJ_4             0.57     -1.7
MED17_LOE3_1           0.51     1.2
MED17_LOE3_2           0.27     -4.2
MED17_LOE3_3           0.31     -4.6
MED17_LOE3_4           0.43     -5.8
MED25_LOE3_1           0.027    -4.3
MED25_LOE3_2           0.43     -5.8
MED25_LOE3_3           0.42     -5.9
MRE11A_AJ_2            0.38     -5
MYCN_LOE2_2            0.99     -0.97
MYCN_LOE2_3            0.72     -0.82
MYCN_LOE2_4            0.61     -4.7
MYO18B_LOE3_1          0.39     -5.1
MYO18B_LOE3_2          0.35     -4.5
MYO18B_LOE3_3          0.21     -5.2
MYO18B_LOE3_4          0.2      -3.1
NAA10_LOE3_2           0.4      -4.3
NAA10_LOE3_3           0.58     0.37
NBN_LOE1_2             0.37     -5.2
NBN_LOE1_4             0.32     -5.9
NDE1_LOE2_2            0.0099   -3.6
NDE1_LOE2_3            0.42     -5.2
NDE1_LOE2_4            0.2      -4.7
NGLY1_LOE3_1           0.39     -4.2
NGLY1_LOE3_3           0.5      -4.6
NGLY1_LOE3_4           0.33     2.3
NHEJ1_LOE3_1           0.28     -3.4
NHEJ1_LOE3_2           0.6      -5.8
NIPBL_LOE2_3           0.72     1.3
NIPBL_LOE2_4           0.19     -5.2
NSDHL_LOE3_2           0.62     -1.3
NSDHL_LOE3_4           0.094    -3.4
NSUN2_AJ_1             0.6      -5.8
NSUN2_AJ_4             0.86     -2.3
OCLN_LOE3_3            0.43     -5.8
OCLN_LOE3_4            0.31     -5.5
ORC1_LOE3_3            0.58     -5
ORC1_LOE3_4            0.3      -4.8
ORC4_LOE3_2            0.92     -1.7
ORC4_LOE3_4            0.2      -5
ORC6_LOE3_1            0.22     -5.7
PAFAH1B1_LOE1_3        0.91     -0.48
PCNT_LOE1_1            0.44     -5.9
PCNT_LOE1_3            0.39     -5.4
PCNT_LOE1_4            0.73     -3.2
PGAP3_LOE3_3           0.2      -4.9
PGAP3_LOE3_4           0.029    -5.4
PHC1_AJ_2              0.89     -0.66
PHC1_AJ_3              0.34     -5.6
PIGA_LOE3_2            0.27     -4
PIGA_LOE3_4            0.55     -1.9
PLK4_LOE2_1            0.23     -5.3
PLK4_LOE2_2            0.42     -5.9
PLK4_LOE2_3            0.51     1.6
PLK4_LOE2_4            0.9      -0.088
PNKP_LOE2_1            0.22     1.8
PNKP_LOE2_3            0.44     -5.8
PNKP_LOE2_4            0.54     0.95
POC1A_AJ_1             0.23     -2.3
POC1A_AJ_2             0.38     -4.1
POLH_LOE1_2            0.27     -5.7
POLH_LOE1_3            0.096    -3.4
POLR2D_dC_1            0.22     -5.5
POLR2D_dC_2            0.21     -3
PQBP1_LOE2_1           0.45     1.2
PQBP1_LOE2_2           0.61     -4.6
PQBP1_LOE2_3           0.3      -5.4
PQBP1_LOE2_4           0.28     -3.4
PRKDC_AJ_1             0.43     -5.6
PRKDC_AJ_2             0.46     -5.7
PRKDC_AJ_4             0.28     -5.1
PRPF19_dC_2            0.12     -4.5
PTEN_pC_3              0.046    -3.5
PYCR2_LOE3_1           0.64     -5.8
PYCR2_LOE3_2           0.27     1.3
PYCR2_LOE3_4           0.08     -4.9
QARS_LOE3_1            0.93     0.57
QARS_LOE3_2            0.4      -4.5
QARS_LOE3_3            0.13     -3.7
RAB18_LOE3_3           0.38     1.5
RAB18_LOE3_4           0.14     -3.6
RAB3GAP1_LOE3_1        0.75     -0.6
RAB3GAP1_LOE3_2        0.39     -4.2
RAB3GAP1_LOE3_4        0.47     -2.5
RAB3GAP2_LOE3_1        0.3      -5.7
RAB3GAP2_LOE3_2        0.84     -1.6
RAB3GAP2_LOE3_3        0.76     -0.62
RAB3GAP2_LOE3_4        0.53     -2.7
RAD21_AJ_1             0.75     -3
RAD21_AJ_4             0.39     -4.6
RAN_dC_2               0.56     -1.6
RARS2_LOE3_2           0.38     -4.8
RARS2_LOE3_3           0.65     -1.1
RARS2_LOE3_4           0.61     -4.6
RBBP8_LOE3_1           0.43     -5.6
RBBP8_LOE3_2           0.05     -4.3
RBBP8_LOE3_3           0.38     -4.1
RBBP8_LOE3_4           0.3      -5.7
RBM10_AJ_1             0.55     0.49
RBM10_AJ_2             0.42     -5.3
RBM10_AJ_3             0.39     -3.6
RBM10_AJ_4             0.89     0.59
RGS6_LOE3_1            0.11     -5.3
RGS6_LOE3_2            0.16     -3.8
RGS6_LOE3_4            0.19     -4.4
RIMS3_LOE3_1           0.026    -4.3
RIMS3_LOE3_4           0.38     -1.9
RPL32_dC_1             0.39     -3.9
RPS6KA3_LOE2_1         0.04     -3.3
RPS6KA3_LOE2_2         0.18     -5.1
RPS6KA3_LOE2_3         0.42     -3
RTTN_LOE3_1            0.18     -5.6
RTTN_LOE3_2            0.4      -4.9
SHH_LOE1_3             0.41     -5.1
SHH_LOE1_4             0.81     -2.6
SIN3A_LOE3_3           0.6      -5.6
SIN3A_LOE3_4           0.33     -2.8
SLC1A4_LOE3_1          0.35     2.3
SLC1A4_LOE3_4          0.61     -4.8
SLC25A19_LOE2_1        0.3      -5.1
SLC25A19_LOE2_4        0.11     -3.3
SLC25A22_LOE3_2        0.063    -3.9
SLC25A22_LOE3_3        0.52     -2
SLC25A22_LOE3_4        0.41     -5.7
SLC2A1_AJ_1            0.43     -5.6
SLC2A1_AJ_3            0.4      -4.9
SLC2A1_AJ_4            0.65     -5.1
SLC9A6_AJ_4            0.6      -5.2
SMARCAL1_AJ_1          0.17     2.6
SMARCAL1_AJ_3          0.7      0.28
SMARCAL1_AJ_4          0.23     -3.2
SMC1A_LOE2_1           0.67     -4
SMC1A_LOE2_2           0.5      0.92
SMC1A_LOE2_4           0.51     -2.5
SNAP29_LOE2_3          0.4      -4.8
SNAP29_LOE2_4          0.089    -5.1
SNRPF_dC_1             0.1      -5
SPOCK1_LOE3_1          0.46     -1.6
SPOCK1_LOE3_2          0.41     -3
SPOCK1_LOE3_3          0.38     -3.7
SPOCK1_LOE3_4          0.27     0.77
SRCAP_AJ_1             0.61     -4
SRCAP_AJ_4             0.34     -2.2
SSR4_LOE3_2            0.21     -4.6
SSR4_LOE3_3            0.41     -5.6
STAMBP_LOE2_2          0.73     0.24
STAMBP_LOE2_3          0.76     -3.1
STIL_LOE1_2            0.6      -3.9
STIL_LOE1_3            0.87     -0.31
STIL_LOE1_4            0.24     -0.45
TAF2_LOE3_1            1        0.44
TAF2_LOE3_2            0.27     -2.8
TagBFP_nC_1            0.42     -5.2
TagBFP_nC_2            0.5      2
TBC1D20_AJ_1           0.95     0.08
TBC1D20_AJ_2           0.4      -3.2
TBC1D20_AJ_3           0.091    -5.2
TBC1D20_AJ_4           0.65     -4.4
TBCD_LOE3_1            0.42     -5.8
TBCD_LOE3_2            0.52     0.87
TCF4_AJ_1              0.45     -5.8
TCF4_AJ_2              0.13     -4.5
TCF4_AJ_3              0.84     0.67
TRAIP_AJ_1             0.22     -5.4
TRAIP_AJ_2             0.1      -5.3
TRAIP_AJ_3             0.032    -4.6
TRAPPC9_LOE2_2         0.17     2.1
TRAPPC9_LOE2_3         0.4      0.24
TRAPPC9_LOE2_4         0.69     -1
TRIO_LOE3_1            0.71     -1.3
TRIO_LOE3_2            0.83     1.1
TRIO_LOE3_3            0.67     -3.9
TRIO_LOE3_4            0.26     -5.8
TRMT10A_LOE3_1         0.39     -5.2
TRMT10A_LOE3_4         0.6      -5
TUBB2B_LOE3_2          0.39     -4.6
TUBB2B_LOE3_3          0.62     -5.6
TUBB_LOE3_2            0.37     -4.6
TUBB_LOE3_3            0.42     -5.9
TUBG1_LOE3_1           0.5      -4.5
TUBG1_LOE3_2           0.53     1.4
TUBG1_LOE3_3           0.41     -5.6
TUBGCP6_LOE2_1         0.23     -5.7
TUBGCP6_LOE2_3         0.57     -4.6
UBE3A_LOE1_1           0.58     -5.2
UBE3A_LOE1_2           0.19     -5.2
UBE3A_LOE1_3           0.24     -5.4
VPS13B_LOE2_3          0.11     -3.8
VRK1_LOE3_1            0.25     -5.5
VRK1_LOE3_2            0.52     1.4
VRK1_LOE3_3            0.4      -5
VRK1_LOE3_4            0.59     1.7
WDR4_LOE3_1            0.32     -5.6
WDR4_LOE3_2            0.53     2.2
WDR4_LOE3_3            0.37     -3.8
WDR4_LOE3_4            0.39     -3.3
WDR62_LOE1_1           0.4      -4
WDR62_LOE1_2           0.8      -2.8
WDR73_LOE2_2           0.14     -4.9
WDR73_LOE2_4           0.38     -4
WWOX_LOE3_2            0.37     -3.9
WWOX_LOE3_3            0.097    -4.4
XPA_LOE1_2             0.26     -0.1
XPA_LOE1_4             0.07     -4
XPC_LOE1_1             0.35     -3.7
XPC_LOE1_3             0.62     -5.8
XPC_LOE1_4             0.21     3.2
XRCC4_AJ_1             0.37     1.7
XRCC4_AJ_2             0.78     -0.65
XRCC4_AJ_3             0.68     -0.84
ZBTB18_LOE3_1          0.28     -4.4
ZBTB18_LOE3_2          0.06     2.7
ZBTB18_LOE3_3          0.43     -5.8
ZBTB18_LOE3_4          0.41     -5.6
ZEB2_LOE3_3            0.075    2.8
ZEB2_LOE3_4            0.16     -5.2

in an R shell

test <- read.table('all-non-targeting-gRNAs.tsv', header=T)

svg('1.svg')
test$colors = grepl("nC",test$gRNA)
test$colors[test$colors == 'TRUE'] = "red"
test$colors[test$colors == 'FALSE'] = "black"
plot(x=test$log2FC,y=-1*log10(test$pvals), col=test$colors, main="non-targeting controls(should have Log2FC=0)=red\nIER3IP1_LOE3_4 is highest point")
dev.off()

svg('2.svg')
test$colors = grepl("pC",test$gRNA)
test$colors[test$colors == 'TRUE'] = "red"
test$colors[test$colors == 'FALSE'] = "black"
plot(x=test$log2FC,y=-1*log10(test$pvals), col=test$colors, main="positive controls (should have +Log2FC)=red\nIER3IP1_LOE3_4 is highest point")
dev.off()

svg('3.svg')
test$colors = grepl("dC",test$gRNA)
test$colors[test$colors == 'TRUE'] = "red"
test$colors[test$colors == 'FALSE'] = "black"
plot(x=test$log2FC,y=-1*log10(test$pvals), col=test$colors, main="negative controls (should have -Log2FC)=red\nIER3IP1_LOE3_4 is highest point")
dev.off()

svg('4.svg')
test <- read.table('all-non-targeting-gRNAs.tsv', header=T)
test <- test[test$pvals > 0.000001,]
test$colors = grepl("nC",test$gRNA)
test$colors[test$colors == 'TRUE'] = "red"
test$colors[test$colors == 'FALSE'] = "black"
plot(x=test$log2FC,y=-1*log10(test$pvals), col=test$colors, main="non-targeting controls (should have Log2FC=0)=red\nwithout IER3IP1_LOE3_4, found\nsignificant in original paper")
dev.off()

svg('5.svg')
test$colors = grepl("pC",test$gRNA)
test$colors[test$colors == 'TRUE'] = "red"
test$colors[test$colors == 'FALSE'] = "black"
plot(x=test$log2FC,y=-1*log10(test$pvals), col=test$colors, main="positive controls (should have +Log2FC)=red\nwithout IER3IP1_LOE3_4, found\nsignificant in original paper")
dev.off()

svg('6.svg')
test$colors = grepl("dC",test$gRNA)
test$colors[test$colors == 'TRUE'] = "red"
test$colors[test$colors == 'FALSE'] = "black"
plot(x=test$log2FC,y=-1*log10(test$pvals), col=test$colors, main="negative controls (should have -Log2FC)=red\nwithout IER3IP1_LOE3_4, found\nsignificant in original paper")
dev.off()

plots

1

plot 1 non-targeting controls - all data

2

plot 2 growth controls - all data

3

plot 3 death controls - all data

4

plot 4 non-targeting controls - removed highest point data

5

plot 5 growth controls - removed highest point data

6

plot 6 death controls - removed highest point data

crispr-licht's People

Contributors

jelber2 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.