Giter VIP home page Giter VIP logo

crispresso2's Introduction

Docker Image Version (tag) CircleCI branch install with bioconda

CRISPResso2

CRISPResso2 is a software pipeline designed to enable rapid and intuitive interpretation of genome editing experiments. A limited web implementation is available at: https://crispresso2.pinellolab.org/.

Briefly, CRISPResso2:

  • aligns sequencing reads to a reference sequence
  • quantifies insertions, mutations and deletions to determine whether a read is modified or unmodified by genome editing
  • summarizes editing results in intuitive plots and datasets

What can I do with CRISPResso2?

CRISPResso2 can be used to analyze genome editing outcomes using cleaving nucleases (e.g. Cas9 or Cpf1) or noncleaving nucleases (e.g. base editors). The following operations can be automatically performed:

  • filtering of low-quality reads
  • adapter trimming
  • alignment of reads to one or multiple reference sequences (in the case of multiple alleles)
  • quantification of HDR and NHEJ outcomes (if the HDR sequence is provided)
  • quantification frameshift/inframe mutations and identification affected splice sites (if an exon sequence is provided)
  • visualization of the indel distribution and position (for cleaving nucleases)
  • visualization of distribution and position of substitutions (for base editors)
  • visualization of alleles and their frequencies

In addition, CRISPResso can be run as part of a larger tool suite:

  • CRISPRessoBatch - for analyzing and comparing multiple experimental conditions at the same site
  • CRISPRessoPooled - for analyzing multiple amplicons from a pooled amplicon sequencing experiment
  • CRISPRessoWGS - for analyzing specific sites in whole-genome sequencing samples
  • CRISPRessoCompare - for comparing editing between two samples (e.g., treated vs control)
  • CRISPRessoAggregate - for aggregating results from previously-run CRISPResso analyses

CRISPResso2 processing

CRISPResso2 Schematic

Quality filtering

Input reads are first filtered based on the quality score (phred33) in order to remove potentially false positive indels. The filtering based on the phred33 quality score can be modulated by adjusting the optimal parameters (see additional notes below).

Adapter trimming

Next, adapters are trimmed from the reads. If no adapter are present, select 'No Trimming' under the 'Trimming adapter' heading in the optional parameters. If reads contain adapter sequences that need to be trimmed, select the adapters used for trimming under the ‘Trimming adapter’ heading in the optional parameters. Possible adapters include Nextera PE, TruSeq3 PE, TruSeq3 SE, TruSeq2 PE, and TruSeq2 SE. The adapters are trimmed from the reads using fastp.

Read merging

If paired-end reads are provided, reads are merged using fastp. This produces a single read for alignment to the amplicon sequence, and reduces sequencing errors that may be present at the end of sequencing reads.

Alignment

The preprocessed reads are then aligned to the reference sequence with a global sequence alignment algorithm that takes into account our biological knowledge of nuclease function. If multiple alleles are present at the editing site, each allele can be passed to CRISPResso2 and sequenced reads will be assigned to the reference sequence or origin.

Visualization and analysis

Finally, after analyzing the aligned reads, a set of informative graphs are generated, allowing for the quantification and visualization of the position and type of outcomes within the amplicon sequence.

How is CRISPResso2 different from CRISPResso?

CRISPResso2 introduces four key innovations for the analysis of genome editing data:

  1. Comprehensive analysis of sequencing data from base editors. We have added additional analysis and visualization capabilities especially for experiments using base editors.
  2. Allele specific quantification of heterozygous references. If the targeted editing region has more than one allele, reads arising from each allele can be deconvoluted.
  3. A novel biologically-informed alignment algorithm. This algorithm incorporates knowledge about the mutations produced by gene editing tools to create more biologically-likely alignments.
  4. Ultra-fast processing time.

Installation

CRISPResso2 can be installed using the conda package manager Bioconda, or it can be run using the Docker containerization system.

Bioconda

To install CRISPResso2 using Bioconda, download and install Anaconda Python, following the instructions at: https://www.anaconda.com/distribution/.

Open a terminal and type:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

To install CRISPResso2 into the current conda environment, type:

conda install CRISPResso2

Alternatively, to create a new environment named crispresso2_env with CRISPResso2, type:

conda create -n crispresso2_env -c bioconda crispresso2

Activate your conda environment:

conda activate crispresso2_env

Verify that CRISPResso is installed using the command:

CRISPResso -h

Bioconda for Apple Silicon

If you would like to install CRISPResso using bioconda on a Mac with Apple silicon (aren't sure?), then there is a slight change you need to make. First, ensure that you have Rosetta installed. Next, you must tell bioconda to install the Intel versions of the packages. If you would like to do this system wide, which we recommend, run the command:

conda config --add subdirs osx-64

Then you can proceed with the installation instructions above.

If you would like to use the Intel versions in a single environment, then run:

CONDA_SUBDIR=osx-64 conda create -n crispresso2_env -c bioconda crispresso2

If you choose to use the CONDA_SUBDIR=osx-64 method, note that if you install additional packages into the environment you will need to add the CONDA_SUBDIR=osx-64 to the beginning of each command. Alternatively, you could set this environment variable in your shell, but we recommend to use the conda config --add subdirs osx-64 method because it is less error prone.

Docker

CRISPResso2 can be used via the Docker containerization system. This system allows CRISPResso2 to run on your system without configuring and installing additional packages. To run CRISPResso2, first download and install docker: https://docs.docker.com/engine/installation/

Next, Docker must be configured to access your hard drive and to run with sufficient memory. These parameters can be found in the Docker settings menu. To allow Docker to access your hard drive, select 'Shared Drives' and make sure your drive name is selected. To adjust the memory allocation, select the 'Advanced' tab and allocate at least 4G of memory.

To run CRISPResso2, make sure Docker is running, then open a command prompt (Mac) or Powershell (Windows). Change directories to the location where your data is, and run the following command:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso -h

The first time you run this command, it will download the Docker image. The -v parameter mounts the current directory to be accessible by CRISPResso2, and the -w parameter sets the CRISPResso2 working directory. As long as you are running the command from the directory containing your data, you should not change the Docker -v or -w parameters.

Additional parameters for CRISPResso2 as described below can be added to this command. For example,

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso -r1 sample.fastq.gz -a ATTAACCAAG

CRISPResso2 usage

CRISPResso2 is designed be run on a single amplicon. For experiments involving multiple amplicons in the same fastq, see the instructions for CRISPRessoPooled or CRISPRessoWGS below.

CRISPResso2 requires only two parameters: input sequences in the form of fastq files (given by the --fastq_r1 and --fastq_r2) parameters, and the amplicon sequence to align to (given by the --amplicon_seq parameter). For example:

Using Bioconda:

CRISPResso --fastq_r1 reads.fastq.gz --amplicon_seq AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 reads.fastq.gz --amplicon_seq AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT

Guardrails

Guardrails automatically check the inputs and results of experiments against standardized values. The guardrail warnings that are triggered are printed in the commandline and at the top of generated reports. In order to turn off the guardrails, add the --disable_guardrails argument.

  • TotalReadsGuardrail Checks if the number of reads is lower than expected. (Default: 10000)
  • OverallReadsAlignedGuardrail Checks if the number of aligned reads is lower than expected. (Default: 90% of the total reads)
  • DisproportionateReadsAlignedGuardrail Checks if the number of reads aligned to an amplicon is higher or lower than expected proportionally. (Default: 30% more or less than expected)
  • LowRatioOfModsInWindowToOutGuardrail Checks if the ratio of modifications inside to outside the quantification window is lower than expected. (Default: 0.01)
  • HighRateOfModificationAtEndsGuardrail Checks if there is a high rate of modifications at the ends of the read. (Default: 0.01)
  • HighRateOfSubstitutionsOutsideWindowGuardrail Checks if there is a high rate of substitutions outside of the quantification windows. (Default: 0.002)
  • HighRateOfSubstitutionsGuardrail Checks if the proportion of substitutions to other modifications is higher than expected. (Default: 0.3)
  • ShortSequenceGuardrail Checks if the provided sequences (both Amplicons and Guides) are shorter than expected. (Amplicon Default: 50, Guide Default: 19)
  • LongAmpliconShortReadsGuardrail Checks if the provided amplicon is more than <value> times the average length of read. (Default: 1.5)

Example run: Non-homologous end joining (NHEJ)

Download the test datasets nhej.r1.fastq.gz and nhej.r2.fastq.gz to your current directory. This is the first 25,000 sequences from a paired-end sequencing experiment. To analyze this experiment, run the command:

Using Bioconda:

CRISPResso --fastq_r1 nhej.r1.fastq.gz --fastq_r2 nhej.r2.fastq.gz --amplicon_seq AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT -n nhej

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 nhej.r1.fastq.gz --fastq_r2 nhej.r2.fastq.gz --amplicon_seq AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT -n nhej

This should produce a folder called 'CRISPResso_on_nhej'. Open the file called CRISPResso_on_nhej/CRISPResso2_report.html in a web browser, and you should see an output like this: CRISPResso2_report.html.

Example run: Multiple alleles

Download the test dataset allele_specific.fastq.gz to your current directory. This is the first 25,000 sequences from a editing experiment targeting one allele. To analyze this experiment, run the following command:

Using Bioconda:

CRISPResso --fastq_r1 allele_specific.fastq.gz --amplicon_seq CGAGAGCCGCAGCCATGAACGGCACAGAGGGCCCCAATTTTTATGTGCCCTTCTCCAACGTCACAGGCGTGGTGCGGAGCCACTTCGAGCAGCCGCAGTACTACCTGGCGGAACCATGGCAGTTCTCCATGCTGGCAGCGTACATGTTCCTGCTCATCGTGCTGGG,CGAGAGCCGCAGCCATGAACGGCACAGAGGGCCCCAATTTTTATGTGCCCTTCTCCAACGTCACAGGCGTGGTGCGGAGCCCCTTCGAGCAGCCGCAGTACTACCTGGCGGAACCATGGCAGTTCTCCATGCTGGCAGCGTACATGTTCCTGCTCATCGTGCTGGG --amplicon_name P23H,WT --guide_seq GTGCGGAGCCACTTCGAGCAGC

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 allele_specific.fastq.gz --amplicon_seq CGAGAGCCGCAGCCATGAACGGCACAGAGGGCCCCAATTTTTATGTGCCCTTCTCCAACGTCACAGGCGTGGTGCGGAGCCACTTCGAGCAGCCGCAGTACTACCTGGCGGAACCATGGCAGTTCTCCATGCTGGCAGCGTACATGTTCCTGCTCATCGTGCTGGG,CGAGAGCCGCAGCCATGAACGGCACAGAGGGCCCCAATTTTTATGTGCCCTTCTCCAACGTCACAGGCGTGGTGCGGAGCCCCTTCGAGCAGCCGCAGTACTACCTGGCGGAACCATGGCAGTTCTCCATGCTGGCAGCGTACATGTTCCTGCTCATCGTGCTGGG --amplicon_name P23H,WT --guide_seq GTGCGGAGCCACTTCGAGCAGC

This should produce a folder called 'CRISPResso_on_allele_specific'. Open the file called CRISPResso_on_allele_specific/CRISPResso2_report.html in a web browser, and you should see an output like this: CRISPResso2_report.html.

Example run: Base editing experiment

Download the test dataset base_editor.fastq.gz to your current directory. This is the first 25,000 sequences from an editing experiment performed at the EMX1 locus. To analyze this experiment, run the following command:

Using Bioconda:

CRISPResso --fastq_r1 base_editor.fastq.gz --amplicon_seq GGCCCCAGTGGCTGCTCTGGGGGCCTCCTGAGTTTCTCATCTGTGCCCCTCCCTCCCTGGCCCAGGTGAAGGTGTGGTTCCAGAACCGGAGGACAAAGTACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAAGAAGGGCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACCTCCAATGACTAGGGTGG --guide_seq GAGTCCGAGCAGAAGAAGAA --quantification_window_size 10 --quantification_window_center -10 --base_editor_output

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 base_editor.fastq.gz --amplicon_seq GGCCCCAGTGGCTGCTCTGGGGGCCTCCTGAGTTTCTCATCTGTGCCCCTCCCTCCCTGGCCCAGGTGAAGGTGTGGTTCCAGAACCGGAGGACAAAGTACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAAGAAGGGCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACCTCCAATGACTAGGGTGG --guide_seq GAGTCCGAGCAGAAGAAGAA --quantification_window_size 10 --quantification_window_center -10 --base_editor_output

This should produce a folder called 'CRISPResso_on_base_editor'. Open the file called CRISPResso_on_base_editor/CRISPResso2_report.html in a web browser, and you should see an output like this: CRISPResso2_report.html.

Parameter List

-h or --help: show a help message and exit.

-r1 or --fastq_r1: The first fastq file.

-r2 or --fastq_r2 FASTQ_R2: The second fastq file for paired end reads.

-a or --amplicon_seq: The amplicon sequence used for the experiment.

-an or --amplicon_name: A name for the reference amplicon can be given. If multiple amplicons are given, multiple names can be specified here. Because amplicon names are used as output filename prefixes, amplicon names are truncated to 21bp unless the parameter --suppress_amplicon_name_truncation is set. (default: Reference)

-g or --guide_seq: sgRNA sequence, if more than one, please separate by commas. Note that the sgRNA needs to be input as the guide RNA sequence (usually 20 nt) immediately adjacent to but not including the PAM sequence (5' of NGG for SpCas9). If the PAM is found on the opposite strand with respect to the Amplicon Sequence, ensure the sgRNA sequence is also found on the opposite strand. The CRISPResso convention is to depict the expected cleavage position using the value of the parameter '--quantification_window_center' nucleotides from the 3' end of the guide. In addition, the use of alternate nucleases besides SpCas9 is supported. For example, if using the Cpf1 system, enter the sequence (usually 20 nt) immediately 3' of the PAM sequence and explicitly set the '--cleavage_offset' parameter to 1, since the default setting of -3 is suitable only for SpCas9. (default: )

-e or --expected_hdr_amplicon_seq: Amplicon sequence expected after HDR. The expected HDR amplicon sequence can be provided to quantify the number of reads showing a successful HDR repair. Note that the entire amplicon sequence must be provided, not just the donor template. CRISPResso2 will quantify identified instances of NHEJ, HDR, or mixed editing events. (default: )

-c or --coding_seq: Subsequence/s of the amplicon sequence covering one or more coding sequences for frameshift analysis. Sequences of exons within the amplicon sequence can be provided to enable frameshift analysis and splice site analysis by CRISPResso2. If more than one (for example, split by intron/s), please separate by commas. Users should provide the subsequences of the reference amplicon sequence that correspond to coding sequences (not the whole exon sequence(s)!). (default: )

sgRNA parameters

-gn or --guide_name: sgRNA names, if more than one, please separate by commas. (default: sgRNA)

-fg or --flexiguide: sgRNA sequence (flexible). The flexiguide sequence will be aligned to the amplicon sequence(s), as long as the guide sequence has homology as set by --flexiguide_homology. (default: '')

-fh or --flexiguide_homology: flexiguides will yield guides in amplicons with at least this homology to the flexiguide sequence (default:80 meaning 80% homology is required)

-fgn or --flexiguide_name: Names for the flexiguides, similar to --guide_name. (default: '')

--discard_guide_positions_overhanging_amplicon_edge: If set, for guides that align to multiple positions, guide positions will be discarded if plotting around those regions would included bp that extend beyond the end of the amplicon. (default: False)

Read filtering, trimming, and merging parameters

--split_interleaved_input: Splits a single fastq file containing paired end reads in two files before running CRISPResso (default: False)

-q or --min_average_read_quality: Minimum average quality score (phred33) to keep a read (default: 0)

-s or --min_single_bp_quality: Minimum single bp score (phred33) to keep a read (default: 0)

--min_bp_quality_or_N: Bases with a quality score (phred33) less than this value will be set to "N" (default: 0)

--trim_sequences: Enable the trimming of Illumina adapters with fastp (default: False)

--fastp_options_string: Override options for fastp, e.g. --length_required 70 --umi

--min_paired_end_reads_overlap: Parameter for the fastp read merging step. Minimum required overlap length between two reads to provide a confident overlap. (default: 10)

Quantification window parameters

-w or --quantification_window_size or --window_around_sgrna: Defines the size (in bp) of the quantification window extending from the position specified by the "--cleavage_offset" or "--quantification_window_center" parameter in relation to the provided guide RNA sequence(s) (--sgRNA). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp. (default: 1)

-wc or --quantification_window_center or --cleavage_offset: Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1 this parameter would be set to 1. For base editors, this could be set to -17. (default: -3)

-qwc or --quantification_window_coordinates: Bp positions in the amplicon sequence specifying the quantification window. This parameter overrides values of the "--quantification_window_center", "-- cleavage_offset", "--window_around_sgrna" or "-- window_around_sgrna" values. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separated by the dash sign like "start-stop", and multiple ranges can be separated by the underscore (_). A value of 0 disables this filter. (can be comma-separated list of values, corresponding to amplicon sequences given in --amplicon_seq e.g. 5-10,5-10_20-30 would specify the 5th-10th bp in the first reference and the 5th-10th and 20th-30th bp in the second reference). Note that if there are multiple amplicons provided, and only one quantification window coordinate is provided, the same quantification window will be used for all amplicons and be adjusted to account for insertions/deletions. (default: None)

--exclude_bp_from_left: Exclude bp from the left side of the amplicon sequence for the quantification of the indels (default: 15)

--exclude_bp_from_right: Exclude bp from the right side of the amplicon sequence for the quantification of the indels (default: 15)

--ignore_substitutions: Ignore substitutions events for the quantification and visualization (default: False)

--ignore_insertions: Ignore insertions events for the quantification and visualization (default: False)

--ignore_deletions: Ignore deletions events for the quantification and visualization (default: False)

--discard_indel_reads: Discard reads with indels in the quantification window from analysis (default: False)

Read alignment parameters

-amas or --amplicon_min_alignment_score: Amplicon Minimum Alignment Score; score between 0 and 100; sequences must have at least this homology score with the amplicon to be aligned (can be comma-separated list of multiple scores, corresponding to amplicon sequences given in --amplicon_seq) After reads are aligned to a reference sequence, the homology is calculated as the number of bp they have in common. If the aligned read has a homology less than this parameter, it is discarded. This is useful for filtering erroneous reads that do not align to the target amplicon, for example arising from alternate primer locations. (default: 60)

--default_min_aln_score or --min_identity_score: Default minimum homology score for a read to align to a reference amplicon (default: 60)

--expand_ambiguous_alignments: If more than one reference amplicon is given, reads that align to multiple reference amplicons will count equally toward each amplicon. Default behavior is to exclude ambiguous alignments. (default: False)

--assign_ambiguous_alignments_to_first_reference: If more than one reference amplicon is given, ambiguous reads that align with the same score to multiple amplicons will be assigned to the first amplicon. Default behavior is to exclude ambiguous alignments. (default: False)

--needleman_wunsch_gap_open: Gap open option for Needleman-Wunsch alignment (default: -20)

--needleman_wunsch_gap_extend: Gap extend option for Needleman-Wunsch alignment (default: -2)

--needleman_wunsch_gap_incentive: Gap incentive value for inserting indels at cut sites (default: 1)

--needleman_wunsch_aln_matrix_loc: Location of the matrix specifying substitution scores in the NCBI format (see ftp://ftp.ncbi.nih.gov/blast/matrices/) (default: EDNAFULL)

Base editing parameters

--base_editor_output: Outputs plots and tables to aid in analysis of base editor studies. If base editor output is selected, plots showing the frequency of substitutions in the quantification window are generated. The target and result bases can also be set to measure the rate of on-target conversion at bases in the quantification window. (default: False)

--conversion_nuc_from: For base editor plots, this is the nucleotide targeted by the base editor (default: C)

--conversion_nuc_to: For base editor plots, this is the nucleotide produced by the base editor (default: T)

Prime editing parameters

--prime_editing_pegRNA_spacer_seq: pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence. (default: )

--prime_editing_pegRNA_extension_seq: Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS). (default: )

--prime_editing_pegRNA_extension_quantification_window_size: Quantification window size (in bp) at flap site for measuring modifications anchored at the right side of the extension sequence. Similar to the --quantification_window parameter, the total length of the quantification window will be 2x this parameter. Default: 5bp (10bp total window size) (default: 5)

--prime_editing_pegRNA_scaffold_seq: If given, reads containing any of this scaffold sequence before extension sequence (provided by --prime_editing_extension_seq) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value ends with 'GGCACCGAGUCGGUGC'. (default: )

--prime_editing_pegRNA_scaffold_min_match_length: Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences. (default: 1)

--prime_editing_nicking_guide_seq: Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence (default: )

--prime_editing_override_prime_edited_ref_seq: If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence. (default='')

Plotting parameters

--plot_histogram_outliers: If set, all values will be shown on histograms. By default (if unset), histogram ranges are limited to plotting data within the 99 percentile. (default: False)

Allele plot parameters

--plot_window_size or --offset_around_cut_to_plot: Defines the size of the window extending from the quantification window center to plot. Nucleotides within plot_window_size of the quantification_window_center for each guide are plotted. (default: 20)

--min_frequency_alleles_around_cut_to_plot: Minimum % reads required to report an allele in the alleles table plot. This parameter only affects plotting. All alleles will be reported in data files. (default: 0.2 (i.e. 0.2%))

--max_rows_alleles_around_cut_to_plot: Maximum number of rows to report in the alleles table plot. (default: 50)

--expand_allele_plots_by_quantification: If set, alleles with different modifications in the quantification window (but not necessarily in the plotting window (e.g. for another sgRNA)) are plotted on separate lines, even though they may have the same apparent sequence. To force the allele plot and the allele table to be the same, set this parameter. If unset, all alleles with the same sequence will be collapsed into one row. (default: False)

--allele_plot_pcts_only_for_assigned_reference: If set, in the allele plots, the percentages will show the percentage as a percent of reads aligned to the assigned reference. Default behavior is to show percentage as a percent of all reads. (default: False)

--annotate_wildtype_allele: Wildtype alleles in the allele table plots will be marked with this string (e.g. **). (default: )

Output parameters

--file_prefix: File prefix for output plots and tables (default: )

-n or --name: Output name of the report (default: the names is obtained from the filename of the fastq file/s used in input) (default: )

-o or --output_folder: Output folder to use for the analysis (default: current folder)

--write_detailed_allele_table: If set, a detailed allele table will be written including alignment scores for each read sequence. (default: False)

--suppress_amplicon_name_truncation: If set, amplicon names will not be truncated when creating output filename prefixes. If not set, amplicon names longer than 21 characters will be truncated when creating filename prefixes. (default: False)

--fastq_output: If set, a fastq file with annotations for each read will be produced. (default: False)

--bam_output': If set, a bam file with alignments for each read will be produced. Setting this parameter will produce a file called 'CRISPResso_output.bam' with the alignments in bam format. If the bowtie2_index is provided, alignments will be reported in reference to that genome. If the bowtie2_index is not provided, alignments will be reported in reference to a custom reference created by the amplicon sequence(s) and written to the file 'CRISPResso_output.fa'. (default: False)

-x or --bowtie2_index: Basename of Bowtie2 index for the reference genome. Optionally used in the creation of a bam file. See bam_output. (default: '')

--keep_intermediate: Keep all the intermediate files (default: False)

--dump: Dump numpy arrays and pandas dataframes to file for debugging purposes (default: False)

--crispresso1_mode: Output as in CRISPResso1. In particular, if this flag is set, the old output files 'Mapping_statistics.txt', and 'Quantification_of_editing_frequency.txt' are created, and the new files 'nucleotide_frequency_table.txt' and 'substitution_frequency_table.txt' and figure 2a and 2b are suppressed, and the files 'selected_nucleotide_percentage_table.txt' are not produced when the flag --base_editor_output is set (default: False)

--suppress_report: Suppress output report, plots output as .pdf only (not .png) (default: False)

--suppress_plots: Suppress output plots (default: False)

--place_report_in_output_folder: If true, report will be written inside the CRISPResso output folder. By default, the report will be written one directory up from the report output. (default: False)

--zip_output: If true, the output folder will be zipped upon completion. If --zip_output is true --place_report_in_output_folder should be true otherwise --place_report_in_output_folder is automatically set to true as well. (default: False)

Miscellaneous parameters

--auto: Infer amplicon sequence from most common reads (default: False)

--dsODN: dsODN sequence -- Reads containing the dsODN are labeled and quantified. (default: '')

--debug: Show debug messages (default: False)

-v or --verbosity: Verbosity level of output to the console (1-4), 4 is the most verbose. If parameter --debug is set --verbosity is overridden and set to 4. (default=3)

--no_rerun: Don't rerun CRISPResso2 if a run using the same parameters has already been finished. (default: False)

--bam_input BAM_INPUT: Aligned reads for processing in bam format. This parameter can be given instead of fastq_r1 to specify that reads are to be taken from this bam file. An output bam is produced that contains an additional field with CRISPResso2 information. (default: )

--bam_chr_loc BAM_CHR_LOC: Chromosome location in bam for reads to process. For example: "chr1:50-100" or "chrX". (default: )

--disable_guardrails: Don't show the guardrail warnings.

CRISPResso2 output

The output of CRISPResso2 consists of a set of informative graphs that allow for the quantification and visualization of the position and type of outcomes within an amplicon sequence.

Data file descriptions

CRISPResso2_report.html is a summary report that can be viewed in a web browser containing all of the output plots and summary statistics.

Alleles_frequency_table.zip can be unzipped to a tab-separated text file that shows all reads and alignments to references. The first column shows the aligned sequence of the sequenced read. The second column shows the aligned sequence of the reference sequence. Gaps in each of these columns represent insertions and deletions. The next column 'Reference_Name' shows the name of the reference that the read aligned to. The fourth column, 'Read_Status' shows whether the read was modified or unmodified. The fifth through seventh columns ('n_deleted', 'n_inserted', 'n_substituted') show the number of bases deleted, inserted, and substituted as compared to the reference sequence. The eighth column shows the number of reads having that sequence, and the ninth column shows the percentage of all reads having that sequence.

CRISPResso_mapping_statistics.txt is a tab-delimited text file showing the number of reads in the input ('READS IN INPUTS') the number of reads after filtering, trimming and merging (READS AFTER PREPROCESSING), the number of reads aligned (READS ALIGNED) and the number of reads for which the alignment had to be computed vs read from cache.

CRISPResso_quantification_of_editing_frequency.txt is a tab-delimited text file showing the number of reads aligning to each reference amplicon, as well as the status (modified/unmodified, number of insertions, deletions, and/or substitutions) of those reads.

CRISPResso_RUNNING_LOG.txt is a text file and shows a log of the CRISPResso run.

CRISPResso2_info.json can be read by other CRISPResso tools and contains information about the run and results.

The remainder of the files are produced for each amplicon, and each file is prefixed by the name of the amplicon if more than one amplicon is given.

Alleles_frequency_table_around_sgRNA_NNNNN.txt is a tab-separated text file that shows alleles and alignments to the specified reference for a subsequence around the sgRNA (here, shown by 'NNNNN'). This data report is produced for each amplicon when a guide is found in the amplicon sequence. A report is generated for each guide. The number of nucleotides shown in this report can be modified by changing the --plot_window_size parameter.

Substitution_frequency_table_around_sgRNA_NNNNN.txt is a tab-separated text file that shows the frequency of substitutions in the amplicon sequence around the sgRNA (here, shown by 'NNNNN'). The first row shows the reference sequence. The following rows show the number of substitutions to each base. For example, the first numeric value in the second row (marked ‘A’) shows the number of bases that have a substitution resulting in an A at the first basepair of the amplicon sequence. The number of unmodified bases at each position is now shown in this table (because they aren’t substitutions). Thus, if the first basepair of the amplicon sequence is an A, the first value in the first row will show 0. A report is generated for each guide. The number of nucleotides shown in this report can be modified by changing the --plot_window_size parameter.

Substitution_frequency_table.txt is a tab-separated text file that shows the frequency of substitutions in the amplicon sequence across the entire amplicon. The first row shows the reference sequence. The following rows show the number of substitutions to each base. For example, the first numeric value in the second row (marked ‘A’) shows the number of bases that have a substitution resulting in an A at the first basepair of the amplicon sequence. The number of unmodified bases at each position is now shown in this table (because they aren’t substitutions). Thus, if the first basepair of the AMPLICON sequence is an A, the first value in the first row will show 0.

Insertion_histogram.txt is a tab-separated text file that shows a histogram of the insertion sizes in the amplicon sequence in the quantification window. Insertions outside of the quantification window are not included. The ins_size column shows the insertion length, and the fq column shows the number of reads having that insertion size.

Deletion_histogram.txt is a tab-separated text file that shows a histogram of the deletion sizes in the amplicon sequence in the quantification window. Deletions outside of the quantification window are not included. The del_size column shows length of the deletion, and the fq column shows the number of reads having that number of substitutions.

Substitution_histogram.txt is a tab-separated text file that shows a histogram of the number of substitutions in the amplicon sequence in the quantification window. Substitutions outside of the quantification window are not included. The sub_count column shows the number of substitutions, and the fq column shows the number of reads having that number of substitutions.

Effect_vector_insertion.txt is a tab-separated text file with a one-row header that shows the percentage of reads with an insertion at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a insertion at that location.

Effect_vector_deletion.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a deletion at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a deletion at that location.

Effect_vector_substitution.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a substitution at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a substitution at that location.

Effect_vector_combined.txt is a tab-separated text file with a one-row header that shows the percentage of reads with any modification (insertion, deletion, or substitution) at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a modification at that location.

Modification_count_vectors.txt is a tab-separated file showing the number of modifications for each position in the amplicon. The first row shows the amplicon sequence, and successive rows show the number of reads with insertions (row 2), insertions_left (row 3), deletions (row 4), substitutions (row 5) and the sum of all modifications (row 6). Additionally, the last row shows the number of reads aligned.

If an insertion occurs between bases 5 and 6, the insertions vector will be incremented at bases 5 and 6. However, the insertions_left vector will only be incremented at base 5 so the sum of the insertions_left row represents an accurate count of the number of insertions, whereas the sum of the insertions row will yield twice the number of insertions.

Quantification_window_modification_count_vectors.txt is a tab-separated file showing the number of modifications for positions in the quantification window of the amplicon. The first row shows the amplicon sequence in the quantification window, and successive rows show the number of reads with insertions (row 2), insertions_left (row 3), deletions (row 4), substitutions (row 5) and the sum of all modifications (row 6). Additionally, the last row shows the number of reads aligned.

Nucleotide_frequency_table.txt is a tab-separated file showing the number of each residue at each position in the amplicon. The first row shows the amplicon sequence, and successive rows show the number of reads with an A (row 2), C (row 3), G (row 4), T (row 5), N (row 6), or a deletion (-) (row 7) at each position.

Quantification_window_nucleotide_frequency_table.txt is a tab-separated file showing the number of each residue at positions in the quantification window of the amplicon. The first row shows the amplicon sequence in the quantification window, and successive rows show the number of reads with an A (row 2), C (row 3), G (row 4), T (row 5), N (row 6), or a deletion (-) (row 7) at each position.

Nucleotide_percentage_table.txt is a tab-separated file showing the percentage of each residue at each position in the amplicon. The first row shows the amplicon sequence, and successive rows show the percentage of reads with an A (row 2), C (row 3), G (row 4), T (row 5), N (row 6), or a deletion (-) (row 7) at each position.

Quantification_window_nucleotide_percentage_table.txt is a tab-separated file showing the percentage of each residue at positions in the quantification window of the amplicon. The first row shows the amplicon sequence in the quantification window, and successive rows show the percentage of reads with an A (row 2), C (row 3), G (row 4), T (row 5), N (row 6), or a deletion (-) (row 7) at each position.

The following report files are produced when the base editor mode is enabled:

Selected_nucleotide_percentage_table_around_sgRNA_NNNNN.txt is a tab-separated text file that shows the percentage of each base at selected nucleotides in the amplicon sequence around the sgRNA (here, shown by 'NNNNN'). If the base editing experiment targets cytosines (as set by the --base_editor_from parameter), each C in the quantification window will be numbered (e.g. C5 represents the cytosine at the 5th position in the selected nucleotides). The percentage of each base at these selected target cytosines is reported, with the first row showing the numbered cytosines, and the remainder of the rows showing the percentage of each nucleotide present at these locations. This file shows nucleotides within '--plot_window_size' bp of the position specified by the parameter '--quantification_window_center' relative to the 3' end of each guide.

Selected_nucleotide_frequency_table_around_sgRNA_NNNNN.txt is a tab-separated text file that shows the frequency of each base at selected nucleotides in the amplicon sequence around the sgRNA (here, shown by 'NNNNN'). If the base editing experiment targets cytosines (as set by the --base_editor_from parameter), each C in the quantification window will be numbered (e.g. C5 represents the cytosine at the 5th position in the selected nucleotides). The frequency of each base at these selected target cytosines is reported, with the first row showing the numbered cytosines, and the remainder of the rows showing the frequency of each nucleotide present at these locations. This file shows nucleotides within '--plot_window_size' bp of the position specified by the parameter '--quantification_window_center' relative to the 3' end of each guide.

The following report files are produced when the amplicon contains a coding sequence:

Frameshift_analysis.txt is a text file describing the number of noncoding, in-frame, and frameshift mutations. This report file is produced when the amplicon contains a coding sequence.

Splice_sites_analysis.txt is a text file describing the number of splicing sites that are unmodified and modified. This file report is produced when the amplicon contains a coding sequence.

Effect_vector_insertion_noncoding.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a noncoding insertion at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a noncoding insertion at that location. This report file is produced when amplicon contains a coding sequence.

Effect_vector_deletion_noncoding.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a noncoding deletion at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a noncoding deletion at that location. This report file is produced when amplicon contains a coding sequence.

Effect_vector_substitution_noncoding.txt is a tab-separated text file with a one-row header that shows the percentage of reads with a noncoding substitution at each base in the reference sequence. The first column shows the 1-based position of the amplicon, and the second column shows the percentage of reads with a noncoding substitution at that location. This report file is produced when amplicon contains a coding sequence.

Troubleshooting

Please check that your input file(s) are in FASTQ format (compressed fastq.gz also accepted).

If you get an empty report, please double check that your amplicon sequence is correct and in the correct orientation. It can be helpful to inspect the first few lines of your FASTQ file - the start of the amplicon sequence should match the start of your sequences. If not, check to see if the files are trimmed (see point below).

It is important to determine whether your reads are trimmed or not. CRISPResso2 assumes that the reads ARE ALREADY TRIMMED! If reads are not already trimmed, select the adapters used for trimming under the ‘Trimming Adapter’ heading under the ‘Optional Parameters’. This is FUNDAMENTAL to CRISPResso analysis. Failure to trim adaptors may result in false positives. This will result in a report where you will observe an unrealistic 100% modified alleles and a sharp peak at the edges of the reference amplicon in figure 4.

The quality filter assumes that your reads uses the Phred33 scale, and it should be adjusted for each user’s specific application. A reasonable value for this parameter is 30.

If your amplicon sequence is longer than your sequenced read length, the R1 and R2 reads should overlap by at least 10bp. For example, if you sequence using 150bp reads, the maximum amplicon length should be 290 bp.

Especially in repetitive regions, multiple alignments may have the best score. If you want to investigate alternate best-scoring alignments, you can view all alignments using this tool: http://rna.informatik.uni-freiburg.de/Teaching/index.jsp?toolName=Gotoh. As input, sequences from the 'Alleles_frequency_table.txt' can be used. Specifically, for a given row, the value in the 'Aligned_Sequence' should be entered into the 'Sequence a' box after removing any dashes, and the value in the 'Reference_Sequence' should be entered into the 'Sequence b' box after removing any dashes. The alternate alignments can be selected in the 'Results' panel in the Output section.

Alternate running modes

CRISPResso2 can be run for many fastqs (CRISPRessoBatch), for many amplicons in the same fastq (CRISPRessoPooled), or for whole-genome sequencing (CRISPRessoWGS).

CRISPRessoBatch

CRISPRessoBatch allows users to specify input files and other command line arguments in a single file, and then to run CRISPResso2 analysis on each file in parallel. Samples for which the amplicon and guide sequences are the same will be compared between batches, producing useful summary tables and coomparison plots.

This flexible utility adds four additional parameters:

--batch_settings: This parameter specifies the tab-separated batch file. The batch file consists of a header line listing the parameters specified, and then one line for each sample describing the parameters for that sample. Each of the parameters for CRISPResso2 given above can be specified for each sample. When CRISPRessoBatch is run, additional parameters can be specified that will be applied to all of the samples listed in the batch file. An example batch file looks like:

name	fastq_r1
sample1	sample1.fq
sample2	sample2.fq
sample3	sample3.fq

--skip_failed: If any sample fails, CRISPRessoBatch will exit without completion. However, if this parameter is specified, CRISPressoBatch will continue and only summarize the statistics of the successfully-completed runs.

-p or --n_processes: This specifies the number of processes to use for quantification. (default: 1)

-bo or --batch_output_folder: Directory where batch analysis output will be stored.

CRISPRessoBatch outputs several summary files and plots:

CRISPRessoBatch_quantification_of_editing_frequency shows the number of reads that were modified for each amplicon in each sample.

CRISPRessoBatch_mapping_statistics.txt aggregates the read mapping data from each sample.

For each amplicon, the following files are produced with the name of the amplicon as the filename prefix:

NUCLEOTIDE_FREQUENCY_SUMMARY.txt and NUCLEOTIDE_PERCENTAGE_SUMMARY.txt aggregate the nucleotide counts and percentages at each position in the amplicon for each sample.

MODIFICATION_FREQUENCY_SUMMARY.txt and MODIFICATION_PERCENTAGE_SUMMARY.txt aggregate the modification frequency and percentage at each position in the amplicon for each sample.

Example run: Batch mode

Download the test dataset files SRR3305543.fastq.gz, SRR3305544.fastq.gz, SRR3305545.fastq.gz, and SRR3305546.fastq.gz to your current directory. These are files are the first 25,000 sequences from an editing experiment performed on several base editors. Also include a batch file that lists these files and the sample names: batch.batch To analyze this experiment, run the following command:

Using Bioconda:

CRISPRessoBatch --batch_settings batch.batch --amplicon_seq CATTGCAGAGAGGCGTATCATTTCGCGGATGTTCCAATCAGTACGCAGAGAGTCGCCGTCTCCAAGGTGAAAGCGGAAGTAGGGCCTTCGCGCACCTCATGGAATCCCTTCTGCAGCACCTGGATCGCTTTTCCGAGCTTCTGGCGGTCTCAAGCACTACCTACGTCAGCACCTGGGACCCC -p 4 --base_editor_output -g GGAATCCCTTCTGCAGCACC -wc -10 -w 20

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoBatch --batch_settings batch.batch --amplicon_seq CATTGCAGAGAGGCGTATCATTTCGCGGATGTTCCAATCAGTACGCAGAGAGTCGCCGTCTCCAAGGTGAAAGCGGAAGTAGGGCCTTCGCGCACCTCATGGAATCCCTTCTGCAGCACCTGGATCGCTTTTCCGAGCTTCTGGCGGTCTCAAGCACTACCTACGTCAGCACCTGGGACCCC -p 4 --base_editor_output -g GGAATCCCTTCTGCAGCACC -wc -10 -w 20

This should produce a folder called 'CRISPRessoBatch_on_batch'. Open the file called CRISPRessoBatch_on_batch/CRISPResso2Batch_report.html in a web browser, and you should see an output like this: CRISPResso2Batch_report.html.

CRISPRessoPooled

CRISPRessoPooled is a utility to analyze and quantify targeted sequencing CRISPR/Cas9 experiments involving pooled amplicon sequencing libraries. One common experimental strategy is to pool multiple amplicons (e.g. a single on-target site plus a set of potential off-target sites) into a single deep sequencing reaction (briefly, genomic DNA samples for pooled applications can be prepared by first amplifying the target regions for each gene/target of interest with regions of 150-400bp depending on the desired coverage. In a second round of PCR, with minimized cycle numbers, barcode and adaptors are added. With optimization, these two rounds of PCR can be merged into a single reaction. These reactions are then quantified, normalized, pooled, and undergo quality control before being sequenced). CRISPRessoPooled demultiplexes reads from multiple amplicons and runs the CRISPResso utility with appropriate reads for each amplicon separately.

Usage

This tool can run in 3 different modes:

Amplicons mode: Given a set of amplicon sequences, in this mode the tool demultiplexes the reads, aligning each read to the amplicon with best alignment, and creates separate compressed FASTQ files, one for each amplicon. Reads that do not align to any amplicon are discarded. After this preprocessing, CRISPResso is run for each FASTQ file, and separated reports are generated, one for each amplicon.

To run the tool in this mode the user must provide:

  1. Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)

  2. A description file containing the amplicon sequences used to enrich regions in the genome and some additional information. In particular, this file, is a tab delimited text file with up to 12 columns (first 2 columns required):

  • AMPLICON_NAME: an identifier for the amplicon (must be unique).

  • AMPLICON_SEQUENCE: amplicon sequence used in the design of the experiment.

  • sgRNA_SEQUENCE (OPTIONAL): sgRNA sequence used for this amplicon without the PAM sequence. If not available, enter NA.

  • EXPECTED_AMPLICON_AFTER_HDR (OPTIONAL): expected amplicon sequence in case of HDR. If more than one, separate by commas and not spaces. If not available, enter NA.

  • CODING_SEQUENCE (OPTIONAL): Subsequence(s) of the amplicon corresponding to coding sequences. If more than one, separate by commas and not spaces. If not available, enter NA.

  • PRIME_EDITING_PEGRNA_SPACER_SEQ (OPTIONAL): pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence. If not available, enter NA.

  • PRIME_EDITING_NICKING_GUIDE_SEQ (OPTIONAL): Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence. If not available, enter NA.

  • PRIME_EDITING_PEGRNA_EXTENSION_SEQ (OPTIONAL): Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS). If not available, enter NA.

  • PRIME_EDITING_PEGRNA_SCAFFOLD_SEQ (OPTIONAL): If given, reads containing any of this scaffold sequence before extension sequence (provided by --primeediting_extension_seq) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value ends with 'GGCACCGAGUCGGUGC'. If not available, enter _NA.

  • PRIME_EDITING_PEGRNA_SCAFFOLD_MIN_MATCH_LENGTH (OPTIONAL): Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences. If not available, enter NA.

  • PRIME_EDITING_OVERRIDE_PRIME_EDITED_REF_SEQ (OPTIONAL):If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence. If not available, enter NA.

  • QWC or QUANTIFICATION_WINDOW_COORDINATES (OPTIONAL): Bp positions in the amplicon sequence specifying the quantification window. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separated by the dash sign like "start-stop", and multiple ranges can be separated by the underscore (_). A value of 0 disables this filter. If not available, enter NA.

  • W or QUANTIFICATION_WINDOW_SIZE (OPTIONAL): Defines the size (in bp) of the quantification window extending from the position specified by the "--cleavage_offset" or "--quantification_window_center" parameter in relation to the provided guide RNA sequence(s) (--sgRNA). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp. (default: 1) If not available, enter _NA.*

  • WC or QUANTIFICATION_WINDOW_CENTER (OPTIONAL): Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1 this parameter would be set to 1. For base editors, this could be set to -17. (default: -3) If not available, enter NA.

A file in the correct format should look like this:

Site1 CACACTGTGGCCCCTGTGCCCAGCCCTGGGCTCTCTGTACATGAAGCAAC CCCTGTGCCCAGCCC NA NA

Site2 GTCCTGGTTTTTGGTTTGGGAAATATAGTCATC NA GTCCTGGTTTTTGGTTTAAAAAAATATAGTCATC NA

Site 3 TTTCTGGTTTTTGGTTTGGGAAATATAGTCATC NA NA GGAAATATA

The user can easily create this file with any text editor or with spreadsheet software like Excel (Microsoft), Numbers (Apple) or Sheets (Google Docs) and then save it as tab delimited file.

Example:

Using Bioconda:

CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -f AMPLICONS_FILE.txt --name ONLY_AMPLICONS_SRR1046762 --gene_annotations gencode_v19.gz

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -f AMPLICONS_FILE.txt --name ONLY_AMPLICONS_SRR1046762 --gene_annotations gencode_v19.gz

The output of CRISPRessoPooled Amplicons mode consists of:

  1. REPORT_READS_ALIGNED_TO_AMPLICONS.txt: this file contains the same information provided in the input description file, plus some additional columns:

    a. Demultiplexed_fastq.gz_filename: name of the files containing the raw reads for each amplicon.

    b. n_reads: number of reads recovered for each amplicon.

  2. A set of fastq.gz files, one for each amplicon.

  3. A set of folders, one for each amplicon, containing a full CRISPResso report.

  4. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  5. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

Genome mode: In this mode the tool aligns each read to the best location in the genome. Then potential amplicons are discovered looking for regions with enough reads (the default setting is to have at least 1000 reads, but the parameter can be adjusted with the option --min_reads_to_use_region). If a gene annotation file from UCSC is provided, the tool also reports the overlapping gene/s to the region. In this way it is possible to check if the amplified regions map to expected genomic locations and/or also to pseudogenes or other problematic regions. Finally CRISPResso is run in each region discovered.

To run the tool in this mode the user must provide:

  1. Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)

  2. The full path of the reference genome in bowtie2 format (e.g. /genomes/human_hg19/hg19). Instructions on how to build a custom index or precomputed index for human and mouse genome assembly can be downloaded from the bowtie2 website: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml.

  3. Optionally the full path of a gene annotations file from UCSC. The user can download this file from the UCSC Genome Browser ( http://genome.ucsc.edu/cgi-bin/hgTables?command=start ) selecting as table "knownGene", as output format "all fields from selected table" and as file returned "gzip compressed". (e.g. /genomes/human_hg19/gencode_v19.gz)

Example:

Using Bioconda:

CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -x /GENOMES/hg19/hg19 --name ONLY_GENOME_SRR1046762 --gene_annotations gencode_v19.gz

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -x /GENOMES/hg19/hg19 --name ONLY_GENOME_SRR1046762 --gene_annotations gencode_v19.gz

The output of CRISPRessoPooled Genome mode consists of:

  1. REPORT_READS_ALIGNED_TO_GENOME_ONLY.txt: this file contains the list of all the regions discovered, one per line with the following information:
  • chr_id: chromosome of the region in the reference genome.

  • bpstart: start coordinate of the region in the reference genome.

  • bpend: end coordinate of the region in the reference genome.

  • fastq_file: location of the fastq.gz file containing the reads mapped to the region.

  • n_reads: number of reads mapped to the region.

  • sequence: the sequence, on the reference genome for the region.

  1. MAPPED_REGIONS (folder): this folder contains all the fastq.gz files for the discovered regions.

  2. A set of folders with the CRISPResso report on the regions with enough reads.

  3. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  4. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

    This running mode is particularly useful to check for mapping artifacts or contamination in the library. In an optimal experiment, the list of the regions discovered should contain only the regions for which amplicons were designed.

Mixed mode (Amplicons + Genome): in this mode, the tool first aligns reads to the genome and, as in the Genome mode, discovers aligning regions with reads exceeding a tunable threshold. Next it will align the amplicon sequences to the reference genome and will use only the reads that match both the amplicon locations and the discovered genomic locations, excluding spurious reads coming from other regions, or reads not properly trimmed. Finally CRISPResso is run using each of the surviving regions.

To run the tool in this mode the user must provide:

  • Paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted)

  • A description file containing the amplicon sequences used to enrich regions in the genome and some additional information (as described in the Amplicons mode section).

  • The reference genome in bowtie2 format (as described in Genome mode section).

  • Optionally the gene annotations from UCSC (as described in Genome mode section).

Example:

Using Bioconda:

CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -f AMPLICONS_FILE.txt -x /GENOMES/hg19/hg19 --name AMPLICONS_AND_GENOME_SRR1046762 --gene_annotations gencode_v19.gz

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooled -r1 SRR1046762_1.fastq.gz -r2 SRR1046762_2.fastq.gz -f AMPLICONS_FILE.txt -x /GENOMES/hg19/hg19 --name AMPLICONS_AND_GENOME_SRR1046762 --gene_annotations gencode_v19.gz

The output of CRISPRessoPooled Mixed Amplicons + Genome mode consists of these files:

  1. REPORT_READS_ALIGNED_TO_GENOME_AND_AMPLICONS.txt: this file contains the same information provided in the input description file, plus some additional columns:

    a. Amplicon_Specific_fastq.gz_filename: name of the file containing the raw reads recovered for the amplicon.

    b. n_reads: number of reads recovered for the amplicon.

    c. Gene_overlapping: gene/s overlapping the amplicon region.

    d. chr_id: chromosome of the amplicon in the reference genome.

    e. bpstart: start coordinate of the amplicon in the reference genome.

    f. bpend: end coordinate of the amplicon in the reference genome.

    g. Reference_Sequence: sequence in the reference genome for the region mapped for the amplicon.

  2. MAPPED_REGIONS (folder): this folder contains all the fastq.gz files for the discovered regions.

  3. A set of folders with the CRISPResso report on the amplicons with enough reads.

  4. SAMPLES_QUANTIFICATION_SUMMARY.txt: this file contains a summary of the quantification and the alignment statistics for each region analyzed (read counts and percentages for the various classes: Unmodified, NHEJ, point mutations, and HDR).

  5. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

The Mixed mode combines the benefits of the two previous running modes. In this mode it is possible to recover in an unbiased way all the genomic regions contained in the library, and hence discover contaminations or mapping artifacts. In addition, by knowing the location of the amplicon with respect to the reference genome, reads not properly trimmed or mapped to pseudogenes or other problematic regions will be automatically discarded, providing the cleanest set of reads to quantify the mutations in the target regions with CRISPResso.

If the focus of the analysis is to obtain the best quantification of editing efficiency for a set of amplicons, we suggest running the tool in the Mixed mode. The Genome mode is instead suggested to check problematic libraries, since a report is generated for each region discovered, even if the region is not mappable to any amplicon (however, his may be time consuming). Finally the Amplicon mode is the fastest, although the least reliable in terms of quantification accuracy.

Parameter List

-f or --amplicons_file: Amplicons description file (default: ''). This file is a tab-delimited text file with up to 14 columns (2 required):

--amplicon_name: an identifier for the amplicon (must be unique)

--amplicon_seq: amplicon sequence used in the experiment

--guide_seq (OPTIONAL): sgRNA sequence used for this amplicon without the PAM sequence. Multiple guides can be given separated by commas and not spaces.

--expected_hdr_amplicon_seq (OPTIONAL): expected amplicon sequence in case of HDR.

--coding_seq (OPTIONAL): Subsequence(s) of the amplicon corresponding to coding sequences. If more than one separate them by commas and not spaces.

--prime_editing_pegRNA_spacer_seq (OPTIONAL): pegRNA spacer sgRNA sequence used in prime editing. The spacer should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the given sequence.

--prime_editing_nicking_guide_seq (OPTIONAL): Nicking sgRNA sequence used in prime editing. The sgRNA should not include the PAM sequence. The sequence should be given in the RNA 5'->3' order, so for Cas9, the PAM would be on the right side of the sequence.

--prime_editing_pegRNA_extension_seq (OPTIONAL): Extension sequence used in prime editing. The sequence should be given in the RNA 5'->3' order, such that the sequence starts with the RT template including the edit, followed by the Primer-binding site (PBS).

--prime_editing_pegRNA_scaffold_seq (OPTIONAL): If given, reads containing any of this scaffold sequence before extension sequence (provided by --prime_editing_extension_seq) will be classified as 'Scaffold-incorporated'. The sequence should be given in the 5'->3' order such that the RT template directly follows this sequence. A common value ends with 'GGCACCGAGUCGGUGC'.

--prime_editing_pegRNA_scaffold_min_match_length (OPTIONAL): Minimum number of bases matching scaffold sequence for the read to be counted as 'Scaffold-incorporated'. If the scaffold sequence matches the reference sequence at the incorporation site, the minimum number of bases to match will be minimally increased (beyond this parameter) to disambiguate between prime-edited and scaffold-incorporated sequences.

--prime_editing_override_prime_edited_ref_seq (OPTIONAL): If given, this sequence will be used as the prime-edited reference sequence. This may be useful if the prime-edited reference sequence has large indels or the algorithm cannot otherwise infer the correct reference sequence.

--quantification_window_coordinates (OPTIONAL): Bp positions in the amplicon sequence specifying the quantification window. This parameter overrides values of the "--quantification_window_center", "-- cleavage_offset", "--window_around_sgrna" or "-- window_around_sgrna" values. Any indels/substitutions outside this window are excluded. Indexes are 0-based, meaning that the first nucleotide is position 0. Ranges are separated by the dash sign like "start-stop", and multiple ranges can be separated by the underscore (_). A value of 0 disables this filter. (can be comma-separated list of values, corresponding to amplicon sequences given in --amplicon_seq e.g. 5-10,5-10_20-30 would specify the 5th-10th bp in the first reference and the 5th-10th and 20th-30th bp in the second reference) (default: None)

--quantification_window_size (OPTIONAL): Defines the size (in bp) of the quantification window extending from the position specified by the "--cleavage_offset" or "--quantification_window_center" parameter in relation to the provided guide RNA sequence(s) (--sgRNA). Mutations within this number of bp from the quantification window center are used in classifying reads as modified or unmodified. A value of 0 disables this window and indels in the entire amplicon are considered. Default is 1, 1bp on each side of the cleavage position for a total length of 2bp.

--quantification_window_center (OPTIONAL): Center of quantification window to use within respect to the 3' end of the provided sgRNA sequence. Remember that the sgRNA sequence must be entered without the PAM. For cleaving nucleases, this is the predicted cleavage position. The default is -3 and is suitable for the Cas9 system. For alternate nucleases, other cleavage offsets may be appropriate, for example, if using Cpf1 this parameter would be set to 1. For base editors, this could be set to -17.

--gene_annotations: Gene Annotation Table from UCSC Genome Browser Tables http://genome.ucsc.edu/cgi-bin/hgTables?command=start, please select as table "knownGene", as output format "all fields from selected table" and as file returned "gzip compressed". (default: '')

-x or --bowtie2_index: Basename of Bowtie2 index for the reference genome. (default: '')

--bowtie2_options_string: Override options for the Bowtie2 alignment command. By default, this is " --end-to-end -N 0 --np 0 -mp 3,2 --score-min L,-5,-3(1-H)" where H is the default homology score. (default: ' --end-to-end -N 0 --np 0 -mp 3,2 --score-min L,-5,-3(1-H)')

--use_legacy_bowtie2_options_string: Use legacy (more stringent) Bowtie2 alignment parameters: " -k 1 --end-to-end -N 0 --np 0 ". (default: False)

--min_reads_to_use_region: Minimum number of reads that align to a region to perform the CRISPResso analysis. (default: 1000)

--skip_failed: Continue with pooled analysis even if one sample fails. (default: False)

--skip_reporting_problematic_regions: Skip reporting of problematic regions. By default, when both amplicons (-f) and genome (-x) are provided, problematic reads that align to the genome but to positions other than where the amplicons align are reported as problematic. (default: False)

--compile_postrun_references: If set, a file will be produced which compiles the reference sequences of frequent amplicons. (default: False)

--compile_postrun_reference_allele_cutoff: Only alleles with at least this percentage frequency in the population will be reported in the postrun analysis. This parameter is given as a percent, so 30 is 30%. (default: 30)

--alternate_alleles: Path to tab-separated file with alternate allele sequences for pooled experiments. This file has the columns "region_name","reference_seqs", and "reference_names" and gives the reference sequences of alternate alleles that will be passed to CRISPResso for each individual region for allelic analysis. Multiple reference alleles and reference names for a given region name are separated by commas (no spaces). (default: '')

--limit_open_files_for_demux: If set, only one file will be opened during demultiplexing of read alignment locations. This will be slightly slower as the reads must be sorted, but may be necessary if the number of amplicons is greater than the number of files that can be opened due to OS constraints. (default: False)

CRISPRessoWGS

CRISPRessoWGS is a utility for the analysis of genome editing experiment from whole genome sequencing (WGS) data. CRISPRessoWGS allows exploring any region of the genome to quantify targeted editing or potentially off-target effects. The intended use case for CRISPRessoWGS is the analysis of targeted regions, and WGS reads from those regions will be realigned using CRISPResso's alignment aligorithm for more accurate genome editing quantification. To scan the entire genome for mutations VarScan or MuTect are more suitable, and identified regions can be analyzed and visualized using CRISPRessoWGS.

Usage

To run CRISPRessoWGS you must provide:

  1. A genome aligned BAM file. To align reads from a WGS experiment to the genome there are many options available, we suggest using either Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/) or BWA (http://bio-bwa.sourceforge.net/).

  2. A FASTA file containing the reference sequence used to align the reads and create the BAM file (the reference files for the most common organism can be download from UCSC: http://hgdownload.soe.ucsc.edu/downloads.html. Download and uncompress only the file ending with .fa.gz, for example for the last version of the human genome download and uncompress the file hg38.fa.gz)

  3. Descriptions file containing the coordinates of the regions to analyze and some additional information. In particular, this file is a tab delimited text file with up to 7 columns (4 required):

    • chr_id: chromosome of the region in the reference genome.

    • bpstart: start coordinate of the region in the reference genome.

    • bpend: end coordinate of the region in the reference genome.

    • REGION_NAME: an identifier for the region (must be unique).

    • sgRNA_SEQUENCE (OPTIONAL): sgRNA sequence used for this genomic segment without the PAM sequence. If not available, enter NA.

    • EXPECTED_SEGMENT_AFTER_HDR (OPTIONAL): expected genomic segment sequence in case of HDR. If more than one, separate by commas and not spaces. If not available, enter NA.

    • CODING_SEQUENCE (OPTIONAL): Subsequence(s) of the genomic segment corresponding to coding sequences. If more than one, separate by commas and not spaces. If not available, enter NA.

A file in the correct format should look like this (column entries must be separated by tabs):

chr1 65118211 65118261 R1 CTACAGAGCCCCAGTCCTGG NA NA

chr6 51002798 51002820 R2 NA NA NA

Note: no column titles should be entered. As you may have noticed this file is just a BED file with extra columns. For this reason a normal BED file with 4 columns, is also accepted by this utility.

  1. Optionally the full path of a gene annotations file from UCSC. You can download the this file from the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start) selecting as table "knownGene", as output format "all fields from selected table" and as file returned "gzip compressed". (something like: /genomes/human_hg19/gencode_v19.gz)

Example:

Using Bioconda:

CRISPRessoWGS -b WGS/50/50_sorted_rmdup_fixed_groups.bam -f WGS_TEST.txt -r /GENOMES/mm9/mm9.fa --gene_annotations ensemble_mm9.txt.gz --name CRISPR_WGS_SRR1542350

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoWGS -b WGS/50/50_sorted_rmdup_fixed_groups.bam -f WGS_TEST.txt -r /GENOMES/mm9/mm9.fa --gene_annotations ensemble_mm9.txt.gz --name CRISPR_WGS_SRR1542350

The output from these files will consist of:

  1. REPORT_READS_ALIGNED_TO_SELECTED_REGIONS_WGS.txt: this file contains the same information provided in the input description file, plus some additional columns:

    a. sequence: sequence in the reference genome for the region specified.

    b. gene_overlapping: gene/s overlapping the region specified.

    c. n_reads: number of reads recovered for the region.

    d. bam_file_with_reads_in_region: file containing only the subset of the reads that overlap, also partially, with the region. This file is indexed and can be easily loaded for example on IGV for visualization of single reads or for the comparison of two conditions. For example, in the figure below (fig X) we show reads mapped to a region inside the coding sequence of the gene Crygc subjected to NHEJ (CRISPR_WGS_SRR1542350) vs reads from a control experiment (CONTROL_WGS_SRR1542349).

    e. fastq.gz_file_trimmed_reads_in_region: file containing only the subset of reads fully covering the specified regions, and trimmed to match the sequence in that region. These reads are used for the subsequent analysis with CRISPResso.

  2. ANALYZED_REGIONS (folder): this folder contains all the BAM and FASTQ files, one for each region analyzed.

  3. A set of folders with the CRISPResso report on the regions provided in input with enough reads (the default setting is to have at least 10 reads, but the parameter can be adjusted with the option

    --min_reads_to_use_region).

  4. CRISPRessoPooled_RUNNING_LOG.txt: execution log and messages for the external utilities called.

This utility is particular useful to investigate and quantify mutation frequency in a list of potential target or off-target sites, coming for example from prediction tools, or from other orthogonal assays.

Parameter List

-b or --bam_file: WGS aligned bam file. (default: 'bam filename')

-f or --region_file: Regions description file. A BED format file containing the regions to analyze, one per line. The REQUIRED columns are: chr_id(chromosome name), bpstart(start position), bpend(end position), the optional columns are: name (an unique indentifier for the region), guide_seq, expected_hdr_amplicon_seq,coding_seq, see CRISPResso help for more details on these last 3 parameters).

-r or --reference_file: A FASTA format reference file (for example hg19.fa for the human genome). (default: '')

--min_reads_to_use_region: Minimum number of reads that align to a region to perform the CRISPResso analysis. (default: 10)

--gene_annotations: Gene Annotation Table from UCSC Genome Browser Tables http://genome.ucsc.edu/cgi-bin/hgTables?command=start, please select as table "knownGene", as output format "all fields from selected table" and as file returned "gzip compressed". (default: '')

--crispresso_command: CRISPResso command to call. (default: 'CRISPResso')

CRISPRessoCompare

CRISPRessoCompare is a utility for the comparison of a pair of CRISPResso analyses. CRISPRessoCompare produces a summary of differences between two conditions, for example a CRISPR treated and an untreated control sample (see figure below). Informative plots are generated showing the differences in editing rates and localization within the reference amplicon,

Usage

To run CRISPRessoCompare you must provide:

  1. Two output folders generated with CRISPResso using the same reference amplicon and settings but on different datasets.
  2. Optionally a name for each condition to use for the plots, and the name of the output folder

Example:

Using Bioconda:

CRISPRessoCompare -n1 "VEGFA CRISPR" -n2 "VEGFA CONTROL"  -n VEGFA_Site_1_SRR10467_VS_SRR1046787 CRISPResso_on_VEGFA_Site_1_SRR1046762/ CRISPResso_on_VEGFA_Site_1_SRR1046787/

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoCompare -n1 "VEGFA CRISPR" -n2 "VEGFA CONTROL"  -n VEGFA_Site_1_SRR10467_VS_SRR1046787 CRISPResso_on_VEGFA_Site_1_SRR1046762/ CRISPResso_on_VEGFA_Site_1_SRR1046787/

The output will consist of:

  1. Comparison_Efficiency.pdf: a figure containing a comparison of the edit frequencies for each category (NHEJ, MIXED NHEJ-HDR and HDR) and as well the net effect subtracting the second sample (second folder in the command line) provided in the analysis from the first sample (first folder in the command line).
  2. Comparison_Combined_Insertion_Deletion_Substitution_Locations.pdf: a figure showing the average profile for the mutations for the two samples in the same scale and their difference with the same convention used in the previous figure (first sample – second sample).
  3. CRISPRessoCompare_significant_base_counts.txt: a text file reporting the number of bases for each amplicon and in the quantification window for each amplicon that were significantly enriched for Insertions, Deletions, and Substitutions, as well as All Modifications (Fisher's exact test, Bonferonni corrected p-values).
  4. CRISPRessoCompare_RUNNING_LOG.txt: detailed execution log.

Parameter List

crispresso_output_folder_1: First output folder with CRISPResso analysis (Required) crispresso_output_folder_2: Second output folder with CRISPResso analysis (Required)

-n or --name: Output name. (default:'') -n1 or --sample_1_name: Sample 1 name -n2 or --sample_2_name: Sample 2 name -o or --output_folder: Output folder name --reported_qvalue_cutoff: Q-value cutoff for signifance in tests for differential editing. Each base position is tested (for insertions, deletions, substitutions, and all modifications) using Fisher's exact test, followed by Bonferonni correction. The number of bases with a significance below this threshold in the quantification window are counted and reported in the output summary. (default:0.05) --min_frequency_alleles_around_cut_to_plot: Minimum %% reads required to report an allele in the alleles table plot. (default:0.2) --max_rows_alleles_around_cut_to_plot: Maximum number of rows to report in the alleles table plot. (default:50) --suppress_report: Suppress output report. (default:False) --place_report_in_output_folder: If true, report will be written inside the CRISPResso output folder. By default, the report will be written one directory up from the report output. (default:False)

CRISPRessoPooledWGSCompare

CRISPRessoPooledWGSCompare is an extension of the CRIPRessoCompare utility allowing the user to run and summarize multiple CRISPRessoCompare analyses where several regions are analyzed in two different conditions, as in the case of the CRISPRessoPooled or CRISPRessoWGS utilities.

Usage

To run CRISPRessoPooledWGSCompare you must provide:

  1. Two output folders generated with CRISPRessoPooled or CRISPRessoWGS using the same reference amplicon and settings but on different datasets.
  2. Optionally a name for each condition to use for the plots, and the name of the output folder

Example:

Using Bioconda:

CRISPRessoPooledWGSCompare CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/ CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046787/ -n1 SRR1046762 -n2 SRR1046787 -n AMPLICONS_AND_GENOME_SRR1046762_VS_SRR1046787

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooledWGSCompare CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/ CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046787/ -n1 SRR1046762 -n2 SRR1046787 -n AMPLICONS_AND_GENOME_SRR1046762_VS_SRR1046787

The output from these files will consist of:

  1. COMPARISON_SAMPLES_QUANTIFICATION_SUMMARIES.txt: this file contains a summary of the quantification for each of the two conditions for each region and their difference (read counts and percentages for the various classes: Unmodified, NHEJ, MIXED NHEJ-HDR and HDR).
  2. A set of folders with CRISPRessoCompare reports on the common regions with enough reads in both conditions.
  3. CRISPRessoPooledWGSCompare_significant_base_count_summary.txt: a text file summarizing for each sample and amplicon in both conditions the number of bases for each amplicon and in the quantification window for each amplicon that were significantly enriched for Insertions, Deletions, and Substitutions, as well as All Modifications (Fisher's exact test, Bonferonni corrected p-values).
  4. CRISPRessoPooledWGSCompare_RUNNING_LOG.txt: detailed execution log.

Parameter List

crispresso_pooled_wgs_output_folder_1: First output folder with CRISPRessoPooled or CRISPRessoWGS analysis (Required) crispresso_pooled_wgs_output_folder_2: Second output folder with CRISPRessoPooled or CRISPRessoWGS analysis (Required)

-p or --n_processes: Number of processes to use for this analysis. Can be set to 'max'. -n or --name: Output name. (default:'') -n1 or --sample_1_name: Sample 1 name -n2 or --sample_2_name: Sample 2 name -o or --output_folder: Output folder name --reported_qvalue_cutoff: Q-value cutoff for signifance in tests for differential editing. Each base position is tested (for insertions, deletions, substitutions, and all modifications) using Fisher's exact test, followed by Bonferonni correction. The number of bases with a significance below this threshold in the quantification window are counted and reported in the output summary. (default:0.05) --min_frequency_alleles_around_cut_to_plot: Minimum %% reads required to report an allele in the alleles table plot. (default:0.2) --max_rows_alleles_around_cut_to_plot: Maximum number of rows to report in the alleles table plot. (default:50) --suppress_report: Suppress output report. (default:False) --place_report_in_output_folder: If true, report will be written inside the CRISPResso output folder. By default, the report will be written one directory up from the report output. (default:False)

CRISPRessoAggregate

CRISPRessoAggregate is a utility to combine the analysis of several CRISPResso runs. The analyses are summarized and editing rates are optionally visualized in a summary report.

Usage

CRISPRessoAggregate has the following parameters:

--name: Output name of the report (required)

--prefix: Prefix for CRISPResso folders to aggregate (may be specified multiple times)

--suffix: Suffix for CRISPResso folders to aggregate

--min_reads_for_inclusion: Minimum number of reads for a run to be included in the run summary (default: 0)

--place_report_in_output_folder: If true, report will be written inside the CRISPResso output folder. By default, the report will be written one directory up from the report output (default: False)

--suppress_report: Suppress output report (default: False)

--suppress_plots: Suppress output plots (default: False)

To run CRISPRessoCompare you must provide the --name parameter, and CRISPResso folders in the current directory will be summarized. To summarize folders in other locations, provide these locations using the '--prefix' parameter.

Example:

Using Bioconda:

CRISPRessoAggregate --name "VEGFA" --prefix CRISPRessoRuns/VEGFA/

Using Docker:

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoAggregate --name "VEGFA" --prefix CRISPRessoRuns/VEGFA/

The output will consist of:

  1. CRISPResso2Aggregate_report.html: a html file containing links to all aggregated runs.
  2. CRISPRessoAggregate_amplicon_information.txt: A tab-separated file with a line for each amplicon that was found in any run. The 'Amplicon Name' column shows the unique name for this amplicon sequence. 'Number of sources' shows how many runs the amplicon was found in, and 'Amplicon sources' show which run folders the amplicon was found in, as well as the name of the amplicon in that run.
  3. CRISPRessoAggregate_mapping_statistics.txt: A tab-separated file showing the number of reads sequenced and mapped for each run.
  4. CRISPRessoAggregate_quantification_of_editing_frequency.txt: A tab-separated with the number of reads and edits for each run folder. Data from run folders with multiple amplicons show the sum totals for all amplicons.
  5. CRISPRessoAggregate_quantification_of_editing_frequency_by_amplicon.txt: A tab-separated file showing the number of reads and edits for each amplicon for each run folder. Data from run folders with multiple amplicons will appear on multiple lines, with one line per amplicon.

CRISPRessoPro

CRISPResso is an open source tool for free use by academics. However, for-profit organizations are required to purchase a license to use CRISPResso. As a part of this license, organizations gain access to the CRISPRessoPro package which supplements CRISPResso with several useful features:

  • Interactive and improved plots using D3 and Plotly
  • Customizable colors
  • Customizable warnings based on potential issues in results (guardrails)

Installation

To add CRISPRessoPro to CRISPResso contact Edilytics - [email protected]

D3 and Plotly

If CRISPRessoPro is installed, by default reports will include interactive plots. To use matplotlib for figures add the --use_matplotlib argument.

Customizable Colors and Guardrails

If CRISPRessoPro is installed, by default the colors and guardrails will remain the same as CRISPResso. To alter this, use the --config_file argument and a filepath to a .json file with the following format:

{
  "colors": {
    "Substitution": "#0000FF",
    "Insertion": "#008000",
    "Deletion": "#FF0000",
    "A": "#7FC97F",
    "T": "#BEAED4",
    "C": "#FDC086",
    "G": "#FFFF99",
    "N": "#C8C8C8",
    "-": "#1E1E1E"
  },
  "guardrails": {
    "min_total_reads": 10000,
    "aligned_cutoff": 0.9,
    "alternate_alignment": 0.3,
    "min_ratio_of_mods_in_to_out": 0.01,
    "modifications_at_ends": 0.01,
    "outside_window_max_sub_rate": 0.002,
    "max_rate_of_subs": 0.3,
    "guide_len": 19,
    "amplicon_len": 50,
    "amplicon_to_read_length": 1.5
  }
}

Above are the default values as an example, change the values as desired to any color or guardrail specification.

crispresso2's People

Contributors

blasvicco avatar colelyman avatar dennydai avatar dharjanto avatar kclem avatar matandro avatar mbiokyle29 avatar mbowcut2 avatar natecarlson avatar pehgp avatar ronaldhause avatar snicker7 avatar sshen8 avatar swrosati avatar trevormartinj7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crispresso2's Issues

CRISPResso2 did not be installed properly via conda install

Hi,
I installed CRISPResso2 by creating a new environment named crispresso2_env with CRISPResso2.

(base) surulin@SurudeMacBook:~$ conda create -n crispresso2_env -c bioconda crispresso2 python=2.7

But I could not verify the installation by the command "CRISPResso -h"
I got the following error message.

Describe the bug
(base) surulin@SurudeMacBook:~$ conda activate crispresso2_env (crispresso2_env) surulin@SurudeMacBook:~$ CRISPResso -h Traceback (most recent call last): File "/Users/surulin/anaconda3/envs/crispresso2_env/bin/CRISPResso", line 11, in <module> load_entry_point('CRISPResso2==2.0.30', 'console_scripts', 'CRISPResso')() File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2852, in load_entry_point return ep.load() File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2443, in load return self.resolve() File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2449, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/CRISPResso2/CRISPRessoCORE.py", line 37, in <module> from CRISPResso2 import CRISPRessoPlot File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/CRISPResso2/CRISPRessoPlot.py", line 18, in <module> import seaborn as sns File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/__init__.py", line 6, in <module> from .rcmod import * File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/rcmod.py", line 8, in <module> from . import palettes, _orig_rc_params File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/palettes.py", line 12, in <module> from .utils import desaturate, set_hls_values, get_color_cycle File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/utils.py", line 8, in <module> from scipy import stats File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/scipy/stats/__init__.py", line 345, in <module> from .stats import * File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/scipy/stats/stats.py", line 169, in <module> import scipy.special as special File "/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/scipy/special/__init__.py", line 640, in <module> from ._ufuncs import * ImportError: dlopen(/Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/scipy/special/_ufuncs.so, 2): Library not loaded: @rpath/libopenblas.dylib Referenced from: /Users/surulin/anaconda3/envs/crispresso2_env/lib/python2.7/site-packages/scipy/special/_ufuncs.so Reason: image not found

CRISPResso "ERROR: a float is required" on negative control input samples

Hi Kendell and Luca,

I am attempting to run CRISPResso version 2.0.29 on multiple paired-end samples using the following command:

CRISPResso --fastq_r1 $fastqR1 --fastq_r2 $fastqR2 \
--amplicon_seq $ref \
--guide_seq $grna --expected_hdr_amplicon_seq $hdr \
--name $sample \
--trimmomatic_options_string ILLUMINACLIP:adapters.fa:0:90:10 \
--max_paired_end_reads_overlap 200 \
--exclude_bp_from_left 26 --exclude_bp_from_right 21 \
--plot_window_size 50 --min_frequency_alleles_around_cut_to_plot 0.1 \
--dump --debug

The multiple samples I am running use the same inputs (except for --fastq_r1, --fastq_r2, and --name, which I populate using another .txt file and bash) because I have many positive and negative controls to run alongside the experimental sample. Note I am using CRISPResso and not CRISPRessoBatch at this stage.

For the majority of the samples, CRISPResso runs successfully and generates all the expected outputs. However for some negative control samples, I obtain the following error as per the end of the output file CRISPResso_RUNNING_LOG.txt:

Saving processed data...
Traceback (most recent call last):
  File "/software/miniconda2/lib/python2.7/site-packages/CRISPResso2/CRISPRessoCORE.py", line 1810, in main
    vals.extend([str(x) for x in [round(unmod_pct,8),round(mod_pct,8),n_aligned,N_TOTAL,n_unmod,n_mod,n_discarded,n_insertion,n_deletion,n_substitution,n_only_insertion,n_only_deletion,n_only_substitution,n_insertion_and_deletion,n_insertion_and_substitution,n_deletion_and_substitution,n_insertion_and_deletion_and_substitution]])
TypeError: a float is required

Unexpected error, please check your input.

ERROR: a float is required

CRISPResso version 1.0.13 runs successfully on these same negative control samples that failed, and using this earlier version of CRISPResso, the output Quantification_of_editing_frequency.txt reported 0 reads with HDR. So I am suspecting that line 1810 of the CRISPRessoCORE.py script of CRISPResso version 2.0.29 is attempting to round 0 and thus obtains this error?

Your help with resolving this error would be much appreciated. Please let me know if you require any further information.

Many thanks,
Rebecca

After installation -- pkg_resources.DistributionNotFound error for 'pyparsing<3' causes CRISPResso to quit

CRISPResso Team,

I recently installed CRISPResso2 on our school's linux server. The installation completed; however, when I try to run the help command to check that it is operable, it gives the following error: pkg_resources.DistributionNotFound: The 'pyparsing<3' distribution was not found and is required by CRISPResso2. Yet pyparsing 2.4.7 is listed as an installed package. I have attempted uninstalling and reinstalling the crispresso2 package, but that doesn't fix it. I have also tried updating the package (despite the fact that pyparsing 2.4.7 should be the pyparsing version <3 that is required. Nothing has fixed it.

Could it be that the python version for the conda info output is python 3.5 and not python 2.7? Any help you can give is greatly appreciated.

Thank you!
James

Here is the traceback:
(crispresso2_env) [jimcdonald@cpu008 ~]$ CRISPResso -h
Traceback (most recent call last):
File "/SMHS/home/jimcdonald/.conda/envs/crispresso2_env/bin/CRISPResso", line 6, in
from pkg_resources import load_entry_point
File "/SMHS/home/jimcdonald/.conda/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 3251, in
@_call_aside
File "/SMHS/home/jimcdonald/.conda/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 3235, in _call_aside
f(*args, **kwargs)
File "/SMHS/home/jimcdonald/.conda/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 3264, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/SMHS/home/jimcdonald/.conda/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 583, in _build_master
ws.require(requires)
File "/SMHS/home/jimcdonald/.conda/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 900, in require
needed = self.resolve(parse_requirements(requirements))
File "/SMHS/home/jimcdonald/.conda/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 786, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pyparsing<3' distribution was not found and is required by CRISPResso2

Here is the conda information:
(crispresso2_env) [jimcdonald@cpu008 ~]$ conda info
Current conda install:

         platform : linux-64
    conda version : 4.1.11
conda-env version : 2.5.2

conda-build version : not installed
python version : 3.5.2.final.0
requests version : 2.10.0
root environment : /c1/apps/miniconda/miniconda3 (read only)
default environment : /SMHS/home/jimcdonald/.conda/envs/crispresso2_env
envs directories : /SMHS/home/jimcdonald/.conda/envs
/SMHS/home/jimcdonald/envs
/c1/apps/miniconda/miniconda3/envs
package cache : /SMHS/home/jimcdonald/.conda/envs/.pkgs
/SMHS/home/jimcdonald/envs/.pkgs
/c1/apps/miniconda/miniconda3/pkgs
channel URLs : https://conda.anaconda.org/conda-forge/linux-64/
https://conda.anaconda.org/conda-forge/noarch/
https://conda.anaconda.org/bioconda/linux-64/
https://conda.anaconda.org/bioconda/noarch/
https://repo.continuum.io/pkgs/free/linux-64/
https://repo.continuum.io/pkgs/free/noarch/
https://repo.continuum.io/pkgs/pro/linux-64/
https://repo.continuum.io/pkgs/pro/noarch/
config file : /SMHS/home/jimcdonald/.condarc
offline mode : False
is foreign system : False

Here is the list of packages:
(crispresso2_env) [jimcdonald@cpu008 ~]$ conda list

packages in environment at /SMHS/home/jimcdonald/.conda/envs/crispresso2_env:

argparse 1.4.0 py27_0 bioconda
bowtie2 2.3.5.1 py27he513fc3_0 bioconda
crispresso2 2.0.40 py27h3dcb392_0 bioconda
flash 1.2.11 hed695b0_5 bioconda
samtools 1.7 1 bioconda
trimmomatic 0.39 1 bioconda
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 0_gnu conda-forge
alsa-lib 1.2.3 h516909a_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.1 py_0 conda-forge
backports_abc 0.5 py_1 conda-forge
blas 1.1 openblas conda-forge
bzip2 1.0.8 h516909a_2 conda-forge
ca-certificates 2020.6.20 hecda079_0 conda-forge
cairo 1.16.0 hcf35c78_1003 conda-forge
certifi 2019.11.28 py27h8c360ce_1 conda-forge
curl 7.71.1 he644dc0_3 conda-forge
cycler 0.10.0 py_2 conda-forge
fontconfig 2.13.1 h86ecdb6_1001 conda-forge
freetype 2.10.2 he06d7ca_0 conda-forge
functools32 3.2.3.2 py_3 conda-forge
futures 3.3.0 py27h8c360ce_1 conda-forge
gettext 0.19.8.1 hc5be6a0_1002 conda-forge
giflib 5.2.1 h516909a_2 conda-forge
glib 2.65.0 h6f030ca_0 conda-forge
graphite2 1.3.13 he1b5a44_1001 conda-forge
harfbuzz 2.4.0 h9f30f68_3 conda-forge
icu 64.2 he1b5a44_1 conda-forge
jinja2 2.10 py_1 conda-forge
jpeg 9d h516909a_0 conda-forge
kiwisolver 1.1.0 py27h9e3301b_1 conda-forge
krb5 1.17.1 hfafb76e_1 conda-forge
lcms2 2.11 hbd6801e_0 conda-forge
ld_impl_linux-64 2.34 h53a641e_7 conda-forge
libblas 3.8.0 14_openblas conda-forge
libcblas 3.8.0 14_openblas conda-forge
libcurl 7.71.1 hcdd3856_3 conda-forge
libedit 3.1.20191231 h46ee950_1 conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc 7.2.0 h69d50b8_2 conda-forge
libgcc-ng 9.2.0 h24d8f2e_2 conda-forge
libgfortran-ng 7.5.0 hdf63c60_9 conda-forge
libgomp 9.2.0 h24d8f2e_2 conda-forge
libiconv 1.15 h516909a_1006 conda-forge
liblapack 3.8.0 14_openblas conda-forge
liblapacke 3.8.0 14_openblas conda-forge
libopenblas 0.3.7 h5ec1e0e_6 conda-forge
libpng 1.6.37 hed695b0_1 conda-forge
libssh2 1.9.0 hab1572f_4 conda-forge
libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge
libtiff 4.1.0 hc7e4089_6 conda-forge
libuuid 2.32.1 h14c3975_1000 conda-forge
libwebp-base 1.1.0 h516909a_3 conda-forge
libxcb 1.13 h14c3975_1002 conda-forge
libxml2 2.9.10 hee79883_0 conda-forge
lz4-c 1.9.2 he1b5a44_1 conda-forge
markupsafe 1.1.1 py27hdf8410d_1 conda-forge
matplotlib-base 2.1.2 py27h250f245_1 conda-forge
ncurses 6.2 he1b5a44_1 conda-forge
numpy 1.16.2 py27_blas_openblash1522bff_0 [blas_openblas] conda-forge
openblas 0.3.3 h9ac9557_1001 conda-forge
openjdk 11.0.1 hacce0ff_1021 conda-forge
openssl 1.1.1g h516909a_0 conda-forge
pandas 0.24.0 py27hf484d3e_0 conda-forge
patsy 0.5.1 py_0 conda-forge
pcre 8.44 he1b5a44_0 conda-forge
perl 5.26.2 h516909a_1006 conda-forge
pip 20.1.1 pyh9f0ad1d_0 conda-forge
pixman 0.38.0 h516909a_1003 conda-forge
pthread-stubs 0.4 h14c3975_1001 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
python 2.7.15 h5a48372_1011_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 2.7 1_cp27mu conda-forge
pytz 2020.1 pyh9f0ad1d_0 conda-forge
readline 8.0 he28a2e2_2 conda-forge
scipy 1.1.0 py27_blas_openblash1522bff_1202 [blas_openblas] conda-forge
seaborn 0.9.0 py_2 conda-forge
setuptools 44.0.0 py27_0 conda-forge
singledispatch 3.4.0.3 py27_1000 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sqlite 3.32.3 hcee41ef_1 conda-forge
statsmodels 0.10.2 py27hc1659b7_0 conda-forge
subprocess32 3.5.4 py27h516909a_0 conda-forge
tbb 2020.1 hc9558a2_0 conda-forge
tk 8.6.10 hed695b0_0 conda-forge
tornado 5.1.1 py27h14c3975_1000 conda-forge
wheel 0.34.2 py_1 conda-forge
xorg-fixesproto 5.0 h14c3975_1002 conda-forge
xorg-inputproto 2.3.2 h14c3975_1002 conda-forge
xorg-kbproto 1.0.7 h14c3975_1002 conda-forge
xorg-libice 1.0.10 h516909a_0 conda-forge
xorg-libsm 1.2.3 h84519dc_1000 conda-forge
xorg-libx11 1.6.9 h516909a_0 conda-forge
xorg-libxau 1.0.9 h14c3975_0 conda-forge
xorg-libxdmcp 1.1.3 h516909a_0 conda-forge
xorg-libxext 1.3.4 h516909a_0 conda-forge
xorg-libxfixes 5.0.3 h516909a_1004 conda-forge
xorg-libxi 1.7.10 h516909a_0 conda-forge
xorg-libxrender 0.9.10 h516909a_1002 conda-forge
xorg-libxtst 1.2.3 h14c3975_1002 conda-forge
xorg-recordproto 1.14.2 h516909a_1002 conda-forge
xorg-renderproto 0.11.1 h14c3975_1002 conda-forge
xorg-xextproto 7.3.0 h14c3975_1002 conda-forge
xorg-xproto 7.0.31 h14c3975_1007 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h516909a_1006 conda-forge
zstd 1.4.5 h6597ccf_1 conda-forge

"Your version of CRISPResso2 is out of date"

Hello,
I just tried to install CRISPResso2 using bioconda, and I keep getting the following error message:
"Your version of CRISPResso2 is out of date. Please download a new version."
I downloaded the program minutes ago, so I thought I had the latest version. Is there a different way I should be installing, or a way to update?
Thanks,
Erin

PooledWGSCompare bugs

To get CRISPRessoPooledWGSCompare to run I had to make a couple changes to CRISPRessoPooledWGSCompareCORE.py:

Changed line 59 from
CRISPResso_compare_to_call = os.path.join(os.path.dirname(_ROOT),'CRISPRessoCompare.py')
to
CRISPResso_compare_to_call = 'CRISPRessoCompare'

Changed line 138 from
df_comp=df_quant_1.set_index(['Name','Amplicon']).join(df_quant_2.set_index(['Name','Amplicon']),lsuffix='%s' % args.sample_1_name,rsuffix='%s' % args.sample_2_name)
to
df_comp=df_quant_1.set_index(['Name’,’Name’]).join(df_quant_2.set_index(['Name’,’Name’]),lsuffix='%s' % args.sample_1_name,rsuffix='%s' % args.sample_2_name)
because the SAMPLES_QUANTIFICATION_SUMMARY.txt file generated with CRISPRessoPooled doesn’t have an ‘Amplicon’ column. Changing Amplicon to Name seemed to get things to work, I'm not sure what the proper fix is.

problem running crispresso with expected_hdr_amplicon_seq

Hi,
I can run CrispressoBatch successfully, but I get an error message for each sample where I provide an expected hdr sequence in the argument --expected_hdr_amplicon_seq

It all works fine if I leave the expected_hdr_amplicon_seq empty and leave the rest unchanged.

From the error message I can´t understand what´s the source of the error

here is an example.

i: 20 j: 18446744073709551615
currMatrix:1885958656
seqj: seqi: CGTGTACCAGCTGAGAGACT
CRITICAL @ Thu, 06 Aug 2020 15:02:28:
Unexpected error, please check your input.

ERROR: ('wtf4!:pointer: %i', 20)

WARNING @ Thu, 06 Aug 2020 15:02:28:
CRISPResso command failed (return value 255) on batch #1: "CRISPResso -o crispresso_output/CRISPRessoBatch_on_crispresso_config_HDR_test --name AML00020G09_1 --needleman_wunsch_gap_extend -2 --expand_ambiguous_alignments --aln_seed_count 5 --amplicon_name Reference --amplicon_seq GGTTGGGGCAAAGAGGGAAATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACAGATATCCAGAACCCTGACCCTGCCGTGTACCAGCTGAGAGACTCTAAATCCAGTGACAAGTCTGTCTGCCTATTCACCGATTTTGATTCTCAAACAAATGTGTCACAAAGTAAGGATTCTGATGTGTATATCACAGACAAAACTGTGCTAGACATGAGGTCTATGGACTTCAAGAGCAACAGTGCTGTGGCCTGGAGCAACAAATCT --max_rows_alleles_around_cut_to_plot 50 --prime_editing_pegRNA_extension_quantification_window_size 5 --fastq_r1 "AML-00020-G09_S71_L001_R1_001.fastq.gz" --suppress_plots --quantification_window_size 10 --quantification_window_center 1 --trimmomatic_command trimmomatic --fastq_r2 "AML-00020-G09_S71_L001_R2_001.fastq.gz" --conversion_nuc_from C --min_bp_quality_or_N 0 --default_min_aln_score 60 --needleman_wunsch_gap_incentive 1 --min_paired_end_reads_overlap 10 --plot_window_size 20 --prime_editing_pegRNA_scaffold_min_match_length 1 --aln_seed_min 2 --aln_seed_len 10 --expected_hdr_amplicon_seq GGTTGGGGCAAAGAGGGAAATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACAGATATCCAGAACCCTGACCCTGCTCCAGTGACAAGTCTGTCTGCCTATTCACCGATTTTGATTCTCAAACAAATGTGTCACAAAGTAAGGATTCTGATGTGTATATCACAGACAAAACTGTGCTAGACATGAGGTCTATGGACTTCAAGAGCAACAGTGCTGTGGCCTGGAGCAACAAATCT --needleman_wunsch_gap_open -20 --max_paired_end_reads_overlap 150 --guide_seq GAGTCTCTCAGCTGGTACACG --conversion_nuc_to T --flexiguide_homology 80 --flash_command flash --min_single_bp_quality 0 --exclude_bp_from_left 15 --needleman_wunsch_aln_matrix_loc EDNAFULL --min_average_read_quality 0 --min_frequency_alleles_around_cut_to_plot 0.2 --exclude_bp_from_right 15"

Any idea what it could be?

conda install bug

Describe the bug
when installing by conda, some errors comes from decorator and numpy

To reproduce
conda create -n crispresso python=2.7 CRISPResso2

Debug output
decorator need more than 4.3.0,but conda install 4.1.2; numpy will have numpy.dtype size changed, may indicate binary incompatibility error

Frequency in indel histogram and number of reads in alleles frequency table

I checked two output files Indel_histogram and Alleles_frequency_table_around_sgRNA. My understanding was the numbers should be consistent between two tables.
For example,
+1 insertion (from Indel_histogram) == (n_inserted == 1 & n_deleted == 0) (from Alleles_freq)
-1 deletion == (n_inserted == 0 & n_deleted == 1)

However the numbers are not consistent, usually have small difference. Could you please explain the reason?

Join mutations type to Alleles frequency table, adjacent sgRNAsworkflow

Hi

I have three questions about features I cannot see in the documentation but wonder if they exist:

  1. When a coding sequence is provided the Frameshift_analysis.txt gives a summary of the number of modified reads which give either frameshift, in-frame or non-coding mutations. Is it possible to get allele specific mutation calls from this analysis join it with the table in Assign Alleles_frequency_table_around_sgRNA_NNNNN.txt ?

  2. What is your recommended workflow when there are two adjacent sgRNAs on a single amplicon. I realize that one could increase the quantification_window_size to encompass both amplicons but it would be nice to compare the editing of one vs the second and the level of duel edits. I guess one could perform each separably and compare but I imagine one would need to retain read IDs. Is this possible?

  3. Intuitively I feel that doing an analysis without providing a sgRNA should provide the same results as when I do provide one and set the quantification_window_size to 0. However when I do I get very slight differences which I think is due to very small differences in the number of reads aligning (24478 vs 24474 using the example NHEJ test datasets). I've done each one twice and I get the same thing. Why does providing a sgRNA slightly alter the read alignment?

Many thanks for a great tool!

too many values to unpack

I used bbmap to map reads as it allows large insertions and deletions and test with bowtie bwa do not perform well at this and those sites are lost. I then run the tool and says "too many values to unpack". locations file is shown below. coverage is very high.

contig1 68000 70000 cad35383 NA NA NA
contig2 256000 258000 cad003773 NA NA NA
contig3 27000 29000 cad028486 NA NA NA

command:
CRISPRessoWGS -b merged_mapped.sort.bam -r /home/data/bioinf_resources/ROB_IWGSC/EARLHAM/Triticum_aestivum_Cadenza_EIv1.1.fa -f locations.txt

INFO @ Mon, 25 Nov 2019 14:21:03:
Checking dependencies...

INFO @ Mon, 25 Nov 2019 14:21:03:

All the required dependencies are present!

INFO @ Mon, 25 Nov 2019 14:21:03:
Creating Folder CRISPRessoWGS_on_merged_mapped.sort

INFO @ Mon, 25 Nov 2019 14:21:03:
Done!

INFO @ Mon, 25 Nov 2019 14:21:03:
Index file for input .bam file exists, skipping generation.

INFO @ Mon, 25 Nov 2019 14:21:04:
The index for the reference fasta file is already present! Skipping generation.

INFO @ Mon, 25 Nov 2019 14:21:13:

Processing each region...

INFO @ Mon, 25 Nov 2019 14:21:13:

Extracting reads in:contig1:68000-69999 and create the .bam file: CRISPRessoWGS_on_merged_mapped.sort/ANALYZED_REGIONS/REGION_cad35383.bam

INFO @ Mon, 25 Nov 2019 14:21:17:
Trim reads and create a fastq.gz file in: CRISPRessoWGS_on_merged_mapped.sort/ANALYZED_REGIONS/REGION_cad35383.fastq.gz

CRITICAL @ Mon, 25 Nov 2019 14:21:19:

ERROR: too many values to unpack

Error: global name 'errno' is not defined

Hi,

I got an error when trying CRISPResso in Docker. The NHEJ examples worked well, but an error called global name 'errno' is not defined reported when multiple alleles and base editing experiment were tested. This CRISPResso was run in Docker for windows 7/8, and my version is 2.0.31. It seems like the claim of errno module importing is missing in the file CRISPRessoShared.py.

Traceback (most recent call last):
File "/opt/conda/lib/python2.7/site-packages/CRISPResso2-2.0.31-py2.7-linux-x86_64.egg/CRISPResso2/CRISPRessoCORE.py", line 931, in main
CRISPRessoShared.force_symlink(os.path.abspath(args.fastq_r1),symlink_filename)
File "/opt/conda/lib/python2.7/site-packages/CRISPResso2-2.0.31-py2.7-linux-x86_64.egg/CRISPResso2/CRISPRessoShared.py", line 271, in force_symlink
if exc.errno == errno.EEXIST:
NameError: global name 'errno' is not defined

Hope this helpful.

Cannot find nucleotide information

Hi Crispresso team,
I'm trying to analyze an HDR experiment using the batch input, but 57 of my 130 samples were skipped with the following Info message: "Skipping the amplicon 'HDR' in folder 'xxx'. Cannot find nucleotide information."
I tried to check out the code but I still don't understand wich information I am missing. Can you help me with this?

Thanks

ERROR: 'numpy.float64' object has no attribute 'lower'

Hello, I am trying to run the CRISPRessoPooled command on the pooled deep sequencing data, but got an error ERROR: 'numpy.float64' object has no attribute 'lower' after FLASH done, could you help me, thank you.
here is the command that i run
CRISPRessoPooled --fastq_r1 s1_FDTL190639699-1a_1.clean.fq.gz --fastq_r2 s1_FDTL190639699-1a_2.clean.fq.gz --min_paired_end_reads_overlap 10 --max_paired_end_reads_overlap 100 --amplicons_file amplicon.txt
the log file is here:
pool_error_log.txt

Incorporating Trimmomatic QC parameters in CRISPResso command

Dear Luca,

I have been trying to figure out how to incorporate the QC parameters for trimmomatic (LEADING TRAILING SLIDING WINDOW MINLEN) into the CRISPResso command. I can run the following command successfully:
CRISPResso --fastq_r1 {read1} --fastq_r2 {read2} --amplicon_seq {amplicon} --trim_sequences --trimmomatic_options_string ILLUMINACLIP:{adapters}:3:30:1:1:true --guide_seq {guide}

But it doesn't work if I add the QC parameters as they are in the trimmomatic command, such as:
CRISPResso --fastq_r1 {read1} --fastq_r2 {read2} --amplicon_seq {amplicon} --trim_sequences --trimmomatic_options_string ILLUMINACLIP:{adapters}:3:30:1:1:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 --guide_seq {guide}

Is there a way to do so or can the QC parameters be changed only through:
--min_average_read_quality
--min_single_bp_quality
--min_bp_quality

I am very new to this so I hope my question makes sense!
Thank you for your help,
Eleonore

HDR mode error and quitting when no donor sequence detected

Hi,

I am running CRISPResso2 in HDR mode, and if a sample does not have any donor construct in the sequencing fastq's then it returns an error. For example, the controls that have no donor return an error or samples with donor but no HDR. The samples with donor run correctly.

It looks like when 0 reads are aligned to the inputted donor sequence, those samples report the error and quit. Samples with 1 alignment or more to the donor finish correctly.

This is the error:
Unexpected error, please check your input
ERROR: a float is required

Thanks!
Brandon

To reproduce
CRISPResso_RUNNING_LOG_float_error.txt

files to reproduce error
384A2-A7_S865_L001_R1.trimmed.paired.fastq.txt
384A2-A7_S865_L001_R2.trimmed.paired.fastq.txt

Debug output
CRISPResso_RUNNING_LOG_debug.txt

'2695 is not in list', u'occurred at index xx'

I use crispresso to analyze fastq file generated by Nanopore/MinIon sequencer/software. For many file crispresso generates:

Unexpected error, please check your input.
ERROR: ('2695 is not in list', u'occurred at index 17')

Attached complete running log and exemplary fastq file that fails. After removing a read that starts with @c207a413- the analysis is complete, however I do not know why this particular read is bad.

Thanks,

Jarek

BC06_part_002.part_010.fastq.gz
CRISPResso_RUNNING_LOG.txt

Read merger doesn't combine any sequence

Describe the bug
I tried to run pear-end reads from multiplexed-PCR experiment.
CRISPResso2 didn't manage to align any read to the amplicon sequence (I chose to run on a single amplicon sequence out of the 50 possible sites I had in the experiment).

Than, I created from the multiplexed-qPCR experiment two fastq files that contains only 100 reads from the same site (these files attached).
I ran CRISPREsso2 and it failed with the following merging statistics:
"[FLASH] Read combination statistics:
[FLASH] Total pairs: 100
[FLASH] Combined pairs: 0
[FLASH] Innie pairs: 0 (0.00% of combined)
[FLASH] Outie pairs: 0 (0.00% of combined)
[FLASH] Uncombined pairs: 100
[FLASH] Percent combined: 0.00%".

When I tried to merge the read using a different program (PEAR), I managed to merge all 100 reads. 76 of them are also perfectly aligned to the amplicon sequence.

  • Note - I tried to run both from Docker and bioconda.

To reproduce
Crispresso2:
crispresso --fastq_r1 CRISPResso2_exmaple_R1.fastq --fastq_r2 CRISPResso2_exmaple_R2.fastq --amplicon_seq CATATACTGCCATTGTGCAAAGCAAAATAGATGCCCTTTCCTCTAAGCAGTCGAAGCCTGAAGCCAACATTATTGGCTTACAGGTGAAGCCTGGGCTGAATTTCGGCTCCTACTAAGCACAGTCTAGCCTGCATTTTCATATG -g AGCCAACATTATTGGCTTAC

PEAR:
pear -f CRISPResso2_exmaple_R1.fastq -r CRISPResso2_exmaple_R2.fastq -o CRISPResso2_exmaple_merge

Debug output
I attached the original files (CRISPResso2_exmaple_R1.fastq and CRISPResso2_exmaple_R2.fastq) with CRISPResso2 and PEAR outputs.
CRISPResso2_bug.zip

CRISPResso command failed (return value 127) on region #0

Crispresso 2.0.23
Total region analyzed 18227 using CRISPRessoWGS
Similar message for all genomic regions.

Running CRISPResso on region #1/18227: /home/pankum/miniconda3/lib/python2.7/site-packages/CRISPResso.py -r1 /san/ongoing/CRISPER_WGS_Data/CRISpresso/B2M-KO_101_predicted/CRISPRessoWGS_on_B2M-KO_101_predicted/ANALYZED_REGIONS/REGION_R_1.fastq.gz -a catctctctagggcaacgtcggctgcagctgagatggctgctccccggtg -o /san/ongoing/CRISPER_WGS_Data/CRISpresso/B2M-KO_101_predicted/CRISPRessoWGS_on_B2M-KO_101_predicted --name R_1 --needleman_wunsch_gap_extend -2 --max_rows_alleles_around_cut_to_plot 50 --aln_seed_count 5 --needleman_wunsch_aln_matrix_loc EDNAFULL --quantification_window_size 1 --quantification_window_center -3 --trimmomatic_command trimmomatic --conversion_nuc_from C --min_bp_quality_or_N 0 --default_min_aln_score 60 --needleman_wunsch_gap_incentive 1 --plot_window_size 40 --aln_seed_min 2 --needleman_wunsch_gap_open -20 --aln_seed_len 10 --conversion_nuc_to T --min_single_bp_quality 0 --exclude_bp_from_left 15 --min_average_read_quality 0 --min_frequency_alleles_around_cut_to_plot 0.2 --exclude_bp_from_right 15
CRISPResso command failed (return value 127) on region #0: "/home/pankum/miniconda3/lib/python2.7/site-packages/CRISPResso.py -r1 /san/ongoing/CRISPER_WGS_Data/CRISpresso/B2M-KO_101_predicted/CRISPRessoWGS_on_B2M-KO_101_predicted/ANALYZED_REGIONS/REGION_R_1.fastq.gz -a catctctctagggcaacgtcggctgcagctgagatggctgctccccggtg -o /san/ongoing/CRISPER_WGS_Data/CRISpresso/B2M-KO_101_predicted/CRISPRessoWGS_on_B2M-KO_101_predicted --name R_1 --needleman_wunsch_gap_extend -2 --max_rows_alleles_around_cut_to_plot 50 --aln_seed_count 5 --needleman_wunsch_aln_matrix_loc EDNAFULL --quantification_window_size 1 --quantification_window_center -3 --trimmomatic_command trimmomatic --conversion_nuc_from C --min_bp_quality_or_N 0 --default_min_aln_score 60 --needleman_wunsch_gap_incentive 1 --plot_window_size 40 --aln_seed_min 2 --needleman_wunsch_gap_open -20 --aln_seed_len 10 --conversion_nuc_to T --min_single_bp_quality 0 --exclude_bp_from_left 15 --min_average_read_quality 0 --min_frequency_alleles_around_cut_to_plot 0.2 --exclude_bp_from_right 15"

Problem with installation

I got this message after installing CRISPResso2
"Your version of CRISPResso2 is out of date. Please download a new version." How can I resolve this issue?

some error need help

I use python setup.py build and python setup.py install --user installed this programer ,,it can use -h,but when using example, it went error ,it is 'ERROR: The specified file '/home/***/.local/lib/python2.7/site-packages/CRISPResso2-2.0.21-py2.7-linux-x86_64.egg/CRISPResso2/EDNAFULL' cannot be opened.' what's this?

Bad marshal data

Hi I have been having some trouble installing CRISPResso. I ended up installing using a specific conda environment. But when I run it with "-h" I get a message about bad marshal data.

Error message: Your version of CRISPResso2 is out of date

Describe the bug
Installed (today, 15 Apr 2019) using Bioconda following instructions in README.md (except Anaconda python 2.7 installed from working url found via google). Did CRISPResso -h command and got the error message: "Your version of CRISPResso2 is out of date. Please download a new version."
Tried using it anyway following instructions for NHEJ in README.md then got same error message

Expected behavior
Expected to see help type information from CRISPResso -h command. Expected some output (as per README.md)when running CRISPResso -r1 <r1_file> -r2 <r2_file> -a <amplicon_seq> -n nehj

To reproduce
CRISPResso command to reproduce the behavior.
CRISPResso -h
or
CRISPResso -r1 <r1_file> -r2 <r2_file> -a <amplicon_seq> -n nehj

Debug output
Paste the entire output when you run CRISPResso with the flag --debug.

Your version of CRISPResso2 is out of date. Please download a new version.

Error after conda installation

Hi,

I installed CRISPResso2 by "conda install CRISPResso2" and I could not verify the installation by the command "CRISPResso -h"

I got the following error message.

" File "/Users/amarco/anaconda2/bin/Crispresso", line 11, in
load_entry_point('CRISPResso2==2.0.23', 'console_scripts', 'CRISPResso')()
File "/Users/amarco/anaconda2/lib/python2.7/site-packages/pkg_resources/init.py", line 487, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/Users/amarco/anaconda2/lib/python2.7/site-packages/pkg_resources/init.py", line 2728, in load_entry_point
return ep.load()
File "/Users/amarco/anaconda2/lib/python2.7/site-packages/pkg_resources/init.py", line 2346, in load
return self.resolve()
File "/Users/amarco/anaconda2/lib/python2.7/site-packages/pkg_resources/init.py", line 2352, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/Users/amarco/anaconda2/lib/python2.7/site-packages/CRISPResso2/CRISPRessoCORE.py", line 34, in
from CRISPResso2 import CRISPRessoCOREResources
File "init.pxd", line 918, in init CRISPResso2.CRISPRessoCOREResources
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject"

How can I solve it?

Thanks in advance for your attention,

Andrés

How to visualizing the alignments using the CRISPResso2’s alignment viewer??

Hello
The Supplementary data Figure, 9 showed the alignments of alleles, that was produced by the CRISPResso2’s alignment viewer. I have run the example file successfully, but the output folder lacking alignments of alleles. kindly guide me on how to draw it by using the CRISPResso2’s alignment viewer.
I also used the --base_editor_output option in the command.
waiting for the response.

Regards
stressed
41587_2019_32_MOESM1_ESM.pdf

what is cached alignment?

Hello,

I have a result where cached aln is larger than computed aln. what is cached alignment? Are those reads mapped to the amplicon sequence? If we calculate mapping rate, do we count cached alignment?

INFO  @ Tue, 31 Mar 2020 13:31:37:

	 Finished reads; N_TOT_READS: 58988 N_COMPUTED_ALN: 8972 N_CACHED_ALN: 49945 N_COMPUTED_NOTALN: 71 N_CACHED_NOTALN: 0 

Thanks,
Yichao

gRNA orientation, plus vs minus strand

Hi,

I need some clarification for assessing NHEJ editing. I don't personally run the analysis so pardon my non-technical descriptions. We currently use the batch mode to analyze our genome editing experiments, and look for editing at several different gRNA (target) sites at once.

If I understand correctly, the software predicts a cutsite based on the gRNA sequence provided, but this predicted cutsite will be different for guides designed for the minus strand. What is the best way to specify the orientation when the PAM and gRNA are on the minus strand? Is the best approach to always provide the minus strand guide sequence, and in this case do you have to also input the minus strand reference sequence as well? Or, is it a better solution move your predicted cutsite and widen the window a bit to cover deletion events at either end of the guide sequence provided?

I just need to know the convention so that I can standardize our editing analysis for minus strand gRNAs.

Some additional info: we are using Cpf1 nucleases and our standard gRNA legnth is 24bp.

Any tips are greatly appreciated!

Thank you,
Anna

Question on large amplicons

Hi, I was just wondering if there is a way to provide a larger file as opposed to a small amplicon that you normally could easily supply on the command line?

On a related note, do you know if there are any programs that are designed for QC of larger amplicons or plasmids for example, of say a few kb? Thanks.

Problem with conda installation

I got this message after installing CRISPResso2 from conda.
"Your version of CRISPResso2 is out of date. Please download a new version." How can I resolve this issue?

build image from your dockerfile

Describe the bug
A clear and concise description of what the bug is.

I am trying to wrap your application with some others into my application on top of your Dockerfile, but looks like your original dockerfile didn't work,
my cmd: docker build -t name .
error:
Traceback (most recent call last):
File "/opt/conda/bin/conda", line 7, in
from conda.cli import main
ImportError: No module named 'conda'
The command '/bin/sh -c apt-get update && apt-get install gcc g++ python-numpy bowtie2 samtools -y --no-install-recommends && apt-get clean && rm -rf /var/lib/apt/lists/* && conda config --add channels defaults && conda config --add channels conda-forge && conda config --add channels bioconda && conda config --set remote_connect_timeout_secs 60 && conda config --set ssl_verify no && conda install --debug biopython && conda install --debug -c bioconda trimmomatic flash && conda clean -ay' returned a non-zero code: 1

can you please point me a direction?

Thank you for your attention.

Expected behavior
A clear and concise description of what you expected to happen.

To reproduce
CRISPResso command to reproduce the behavior.

Debug output
Paste the entire output when you run CRISPResso with the flag --debug.

ERROR: [Errno 1] Operation not permitted: '.'

Describe the bug
Hi there,
I'm running a test using your files and it returns Unexpected error, please check your input.
Do you have any suggestion please?
docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 nhej.r1.fastq.gz --fastq_r2 nhej.r2.fastq.gz --amplicon_seq AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT -n nhej
Thanks

CRISPResso2: Import error.

When I installed CRISPResso2 in a new environment and checked if it was installed by using CRISPResso -h, I got the following error message.

traceback (most recent call last):
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/bin/CRISPResso", line 11, in
load_entry_point('CRISPResso2==2.0.32', 'console_scripts', 'CRISPResso')()
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 489, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 2852, in load_entry_point
return ep.load()
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 2443, in load
return self.resolve()
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 2449, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/CRISPResso2/CRISPRessoCORE.py", line 37, in
from CRISPResso2 import CRISPRessoPlot
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/CRISPResso2/CRISPRessoPlot.py", line 19, in
import seaborn as sns
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/init.py", line 6, in
from .rcmod import *
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/rcmod.py", line 8, in
from . import palettes, _orig_rc_params
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/palettes.py", line 12, in
from .utils import desaturate, set_hls_values, get_color_cycle
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/utils.py", line 8, in
from scipy import stats
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/stats/init.py", line 345, in
from .stats import *
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/stats/stats.py", line 169, in
import scipy.special as special
File "/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/special/init.py", line 640, in
from ._ufuncs import *
ImportError: dlopen(/Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/special/_ufuncs.so, 2): Library not loaded: @rpath/libopenblas.dylib
Referenced from: /Users/pallavigosavi/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/special/_ufuncs.so
Reason: image not found

Could you please help me resolve this error?

Multiple Alleles discrimination

I'm hoping you can advise on use of the Crispresso2 to analyze NHEJ outcomes for multiple alleles. I have two highly similar alleles with ~95% sequence similarity. When using the multiple alleles side-bar form to input my two amplicons with an identical sgRNA, the majority of sequencing reads align to both alleles. Maybe unsurprisingly, the 100% minimum homology parameter returns no alignment of reads, but 90% will not differentiate the reads to one allele or the other. Can you suggest input parameters to test for better assignment of the reads to the proper allele? Does the quantification window need to include identifying SNPs/variation across the amplicons in order to distinguish which allele to map a read with? Is Crispresso2 capable of reliably assigning reads to alleles with this much similarity?

alignment problem

Describe the bug
A clear and concise description of what the bug is.
Alignment issue - main concern here is probably the amplicon sequence (is it the sgRNA sequence (a.k.a the recognition sequence), or would the adapters/index sequences need to be included?)

ERROR: Error: No alignments were found

Expected behavior
A clear and concise description of what you expected to happen.
Alignment of amplicon sequences - Not sure which one is supposed to be used - the sgRNA region/scaffold region of the amplicon product

To reproduce
CRISPResso command to reproduce the behavior.
docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 G1.fastq.gz --amplicon_seq ACACTCTTTCCCTACACGACGCTCTTCCGATCT

Debug output
Paste the entire output when you run CRISPResso with the flag --debug.
Traceback (most recent call last):
File "/opt/conda/lib/python2.7/site-packages/CRISPResso2-2.0.30-py2.7-linux-x86_64.egg/CRISPResso2/CRISPRessoCORE.py", line 419, in main
CRISPRessoShared.check_file(args.fastq_r1)
File "/opt/conda/lib/python2.7/site-packages/CRISPResso2-2.0.30-py2.7-linux-x86_64.egg/CRISPResso2/CRISPRessoShared.py", line 259, in check_file
raise BadParameterException("The specified file '"+filename + "' cannot be opened.\nAvailable files in current directory: " + str(files_in_dir))
BadParameterException: The specified file 'G1.fastq.gz' cannot be opened.
Available files in current directory: ['nhej.r1.fastq.gz', 'CRISPResso_on_nhej', 'Homo_sapiens.zip', 'hg19', 'CRISPResso_on_nhej.html', 'Test_data_20-8-19', 'nhej.r2.fastq.gz']

CRISPRessoWGS unexpected error

Hi, I am getting the following error with some of the trimmed fastq files that are produced by CRISPRessoWGS. I ran fastqc on those failed fastq files and found that some of the sequences are empty and CRISPResso fails because of these empty sequences. Can you please update the code to omit those empty fastq reads after trimming?

image

[Execution log]:
Aligning sequences...
Processing reads; N_TOT_READS: 0 N_COMPUTED_ALN: 0 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0
Processing reads; N_TOT_READS: 10000 N_COMPUTED_ALN: 298 N_CACHED_ALN: 9702 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0
Processing reads; N_TOT_READS: 20000 N_COMPUTED_ALN: 422 N_CACHED_ALN: 19578 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0
Processing reads; N_TOT_READS: 30000 N_COMPUTED_ALN: 523 N_CACHED_ALN: 29477 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0
Processing reads; N_TOT_READS: 40000 N_COMPUTED_ALN: 616 N_CACHED_ALN: 39384 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0
Processing reads; N_TOT_READS: 50000 N_COMPUTED_ALN: 708 N_CACHED_ALN: 49292 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0
Processing reads; N_TOT_READS: 60000 N_COMPUTED_ALN: 886 N_CACHED_ALN: 59114 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0
Unexpected error, please check your input.

ERROR: '*'

The part of the CRISPRessoWGSCORE.py that will be updated is below (to be added "if len(seq)>0"):

def write_trimmed_fastq(in_bam_filename,bpstart,bpend,out_fastq_filename):
p = sb.Popen(
'samtools view %s | cut -f1,4,6,10,11' % in_bam_filename,
stdout = sb.PIPE,
stderr = sb.STDOUT,
shell=True
)

output=p.communicate()[0]
n_reads=0

with gzip.open(out_fastq_filename,'w+') as outfile:

    for line in output.split('\n'):
        if line:
            (name,pos,cigar,seq,qual)=line.split()
            #print name,pos,cigar,seq
            pos=int(pos)
            positions=get_reference_positions(pos,cigar)

            if bpstart in positions and bpend in positions:# and positions[0]<=bpstart and  positions[-1]>=bpend:

                st=positions.index(bpstart)
                en=find_last(positions,bpend)
                #print st,en,seq,seq[st:en]
                n_reads+=1
                #print '>%s\n%s\n+\n%s\n' %(name,seq[st:en],qual[st:en])
                outfile.write('@%s_%d\n%s\n+\n%s\n' %(name,n_reads,seq[st:en],qual[st:en]))
return n_reads

CRISPRessoCompare

Hi, I have the following error using CRISPRessoCompare. I think the input is ok, since CRISPResso was successfully completed for both the samples I'm comparing.
Any idea?

CRISPRessoCompare 1/CRISPResso_on_1_S1_L001_R1_001.fq_1_S1_L001_R2_001.fq/ 2/CRISPResso_on_2_S2_L001_R1_001.fq_2_S2_L001_R2_001.fq/ --debug

...
Traceback (most recent call last):
File "/opt/anaconda3/envs/editing/lib/python2.7/site-packages/CRISPResso2/CRISPRessoCompareCORE.py", line 184, in main
N_TOTAL_1 = float(amplicon_info_1[amplicon_name]['Total'])
KeyError: 'Total'
CRITICAL @ Mon, 20 May 2019 14:40:45:

ERROR: 'Total'

bowtie2 index check fails on large indexes

Bowtie 2 indexes for genomes shorter than ~4GB are stored in files with the .bt2 extension, indexes for larger genomes are stored in files with the .bt2l extension. Crispresso checks that the user provided a valid bowtie2 index by looking for a file with a .bt2 extensions, so the check fails when the user provides an index for a large genome.

Install issue with CRISPResso2

Hi there,

I'm having an issue with the install of crispresso2. I'm new to running code on python and from the terminal in general, so my apologies if this question is incredibly stupid. I can download the program and make a new environment with crispresso2, however when I verify that the files are there I get the following error message:

raceback (most recent call last):
File "/Users/name/opt/anaconda2/envs/crispresso2_env/bin/crispresso", line 11, in
load_entry_point('CRISPResso2==2.0.32', 'console_scripts', 'CRISPResso')()
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 489, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 2852, in load_entry_point
return ep.load()
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 2443, in load
return self.resolve()
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/pkg_resources/init.py", line 2449, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/CRISPResso2/CRISPRessoCORE.py", line 37, in
from CRISPResso2 import CRISPRessoPlot
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/CRISPResso2/CRISPRessoPlot.py", line 19, in
import seaborn as sns
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/init.py", line 6, in
from .rcmod import *
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/rcmod.py", line 8, in
from . import palettes, _orig_rc_params
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/palettes.py", line 12, in
from .utils import desaturate, set_hls_values, get_color_cycle
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/seaborn/utils.py", line 8, in
from scipy import stats
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/stats/init.py", line 345, in
from .stats import *
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/stats/stats.py", line 169, in
import scipy.special as special
File "/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/special/init.py", line 640, in
from ._ufuncs import *
ImportError: dlopen(/Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/special/_ufuncs.so, 2): Library not loaded: @rpath/libopenblas.dylib
Referenced from: /Users/name/opt/anaconda2/envs/crispresso2_env/lib/python2.7/site-packages/scipy/special/_ufuncs.so
Reason: image not found

Any help would be appreciated! Thank you in advance!

How to compute editing efficiency

Hi,
Can I know more about the computation method of assessment of on-target editing efficiency, or is there a corresponding
reference?

Thanks,
Raulee

Image size is too large

Hi!
I installed and use CRISPResso2 via docker. Seems to do quantification and calculation, but not making plots; see running log below. Any suggestions?
Thanks,
Jarek

[Execution log]:
Aligning sequences...
Processing reads; N_TOT_READS: 0 N_COMPUTED_ALN: 0 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0
Finished reads; N_TOT_READS: 2174 N_COMPUTED_ALN: 2062 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 112 N_CACHED_NOTALN: 0
Done!
Quantifying indels/substitutions...
Done!
Calculating allele frequencies...
Done!
Saving processed data...
Making Plots...
Unexpected error, please check your input.

ERROR: Image size of 307900x400 pixels is too large. It must be less than 2^16 in each direction.

Alignment error, please check your input

Dear Luca,

I am encountering the below error when running the command:

CRISPResso
-r1 /work/rpapa/sbelleghem/mutant_miSeq/fastq_trimmed/TL12_R1_paired.fastq.gz
-r2 /work/rpapa/sbelleghem/mutant_miSeq/fastq_trimmed/TL12_R2_paired.fastq.gz
--amplicon_seq ATTGGATCTTAAAAGCTTGGGCTAAGCTCATGTCGACGGTCAGTAATTAGCATTCCGCATATAGTTTACAAAGCATTGCCGTTGTAAATTATTGGAAACTATAATCTTGTGCAAAAACTTGTTTTTTTATAAATATTATAAAATATATTCGTACAGGATTGAAATATAAAAAAAACATATCAGCTGCGAATAAAATTAATAGAGAATAAAAAAATATACTTATATCACAGCGACATATTTATTTTATTCTCTATTTTATTCACATTATATTTTTACTCCATGCCAAATTGATAATAGAATATGAACCTGTAACAACAGTCCTTAAAAATCCAAAACGATTATTAAGTGGTTTAATATTTTTACATAACAACATCAAATAATTTAAATTATATCTATTTCTAGGTAATACAGACAGGTGCTCAACAGGCGGTTGAAGAGTGTCAATACCAATTCCGAAACAGCCGCTGGAACTGCAGCACTGTCGAAAACAGCACTGATATATTTGGAGGAGTACTTAAATTTAGTAAGTAAAAGTTAAATTTTTGATTTAAATTTGTAAATCCTTTTTAATTGACAACCTAAATACTTATTTTTATTTGGATATATTATATAAAAATGTTGGATGAGTTTGGATTCCACTTACTACTTGGCTTCTTGAGCACTAACTTTAAAAATATATAAATTCTATTTGGAAAACGAAAGAAATAAGATTTCAAATGATCTATAACTAACAATTTTTATTATGATAAACCACAAACAACTATACAAAACGATTTACACGTAAAATTAACATATTCTCAACATATTACACAAATAATACTACCGTTAACTCAAAATTGGCATATACATATAAATAAATCTTGAATCATAAAATTCATTTCCGCTCGGATTTCAAGTCAAAGTAAGTTGTAAATTCTCAAATAATTATCGGTTGCATACATCGGCAACTCTTCAAAGGACGTGTTAAGTG
--max_paired_end_reads_overlap 150
--name TL12
--output_folder /work/rpapa/sbelleghem/mutant_miSeq/CRISPResso_out

###############
INFO @ Wed, 05 Jun 2019 13:10:08:
Finished reads; N_TOT_READS: 29195 N_COMPUTED_ALN: 0 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 6874 N_CACHED_NOTALN: 22321

INFO @ Wed, 05 Jun 2019 13:10:08:
Done!

INFO @ Wed, 05 Jun 2019 13:10:08:
Quantifying indels/substitutions...

INFO @ Wed, 05 Jun 2019 13:10:08:
Done!

CRITICAL @ Wed, 05 Jun 2019 13:10:08:
Alignment error, please check your input.

ERROR: Error: No alignments were found
#############

Would you have any advice on what to check to know what is going wrong? The amplicon should be fine as I can easily find alignable sequences in my fastq files.

Thank you for any help!

Steven

list index out of range when amplicon and amplicon_after_hdr match almost entirely

Dear all,

thanks for this very nice software. I have the (admittedly bit unlikely) case where the 290bp amplicon and the 290bp amplicon after hdr are almost the same except for one position in the middle - the desired mutation. Also, the last 100bp of the amplicon are coding sequence. In this case CRISPResso crashes with the error IndexError: list index out of range.

Here is the CRISPResso and the output:

CRISPResso -r1 /projects/seq-work/analysis/susannr/bfx1165/crispresso_bug/CRISPRessoPooled_on_KitW41_ET93/AMPL_Kit_exon18.fastq.gz -a AAGGAAGGTTAGAACCCCTGGACTTCTCTGCTCTTAGTTTACTGTCCTATACTGACTCAACACCCCTATTTTAAAGGGAGATATTAGAATTTTGAATTATAAGTAGGGGAGGTGGCTGGAGGTCACAAGGTTTAAGGTCCTCGTCTATCGCTGTCTTCATTAGCTGCTTGAATTTGCTGTGTTCCGTTCTAGGCACGACTGCCCGTGAAGTGGATGGCACCAGAGAGCATTTTCAGCTGCGTGTACACATTTGAAAGTGATGTCTGGTCCTATGGGATTTTCCTCTGGGAGCT -o /projects/seq-work/analysis/susannr/bfx1165/crispresso_bug/CRISPRessoPooled_on_KitW41_ET93 --name Kit_exon18 -g ACGACTGCCCGTGAAGTGGA,AGGCACGACTGCCCGTGAAG -e AAGGAAGGTTAGAACCCCTGGACTTCTCTGCTCTTAGTTTACTGTCCTATACTGACTCAACACCCCTATTTTAAAGGGAGATATTAGAATTTTGAATTATAAGTAGGGGAGGTGGCTGGAGGTCACAAGGTTTAAGGTCCTCGTCTATCGCTGTCTTCATTAGCTGCTTGAATTTGCTGTGTTCCGTTCTAGGCACGACTGCCCATGAAGTGGATGGCACCAGAGAGCATTTTCAGCTGCGTGTACACATTTGAAAGTGATGTCTGGTCCTATGGGATTTTCCTCTGGGAGCT -c GCACGACTGCCCGTGAAGTGGATGGCACCAGAGAGCATTTTCAGCTGCGTGTACACATTTGAAAGTGATGTCTGGTCCTATGGGATTTTCCTCTGGGAGCT --needleman_wunsch_gap_extend -2 --max_rows_alleles_around_cut_to_plot 50 --aln_seed_count 5 --plot_window_size 20 --quantification_window_size 1 --quantification_window_center -3 --trimmomatic_options_string "ILLUMINACLIP:/home/sequencing/andpetzo/miniconda3/ngs10/envs/Crispresso2/lib/python2.7/site-packages/CRISPResso2/data/adapters/TruSeq2-PE.fa:0:90:10:0:true MINLEN:40" --trimmomatic_command trimmomatic --conversion_nuc_from C --min_bp_quality_or_N 0 --default_min_aln_score 60 --needleman_wunsch_gap_incentive 1 --min_paired_end_reads_overlap 10 --needleman_wunsch_aln_matrix_loc EDNAFULL --aln_seed_min 2 --needleman_wunsch_gap_open -20 --aln_seed_len 10 --max_paired_end_reads_overlap 300 --conversion_nuc_to T --trim_sequences --min_single_bp_quality 0 --exclude_bp_from_left 15 --keep_intermediate --min_average_read_quality 0 --flash_command flash --min_frequency_alleles_around_cut_to_plot 0.2 --exclude_bp_from_right 15 --debug

                               ~~~CRISPResso 2~~~                               
        -Analysis of genome editing outcomes from deep sequencing data-         
                                                                                
                                        _                                       
                                       '  )                                     
                                       .-'                                      
                                      (____                                     
                                   C)|     \                                    
                                     \     /                                    
                                      \___/                                     

                          [CRISPresso version 2.0.29]                           
                    [Kendell Clement and Luca Pinello 2019]                     
                 [For support contact [email protected]]                 

INFO  @ Sat, 13 Jul 2019 13:20:22:
	 Using cut points from Reference as template for other references 

INFO  @ Sat, 13 Jul 2019 13:20:22:
	 Reference 'Reference' has cut points defined: [210, 206]. Not inferring. 

INFO  @ Sat, 13 Jul 2019 13:20:22:
	 Reference 'Reference' has sgRNA_intervals defined: [(194, 213), (190, 209)]. Not inferring. 

INFO  @ Sat, 13 Jul 2019 13:20:22:
	 Reference 'Reference' has exon_positions defined: [192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292]. Not inferring. 

INFO  @ Sat, 13 Jul 2019 13:20:22:
	 Reference 'HDR' has NO cut points or sgRNA intervals idxs defined. Inferring from 'Reference'. 

INFO  @ Sat, 13 Jul 2019 13:20:22:
	 Reference 'HDR' has NO exon_positions defined. Inferring from 'Reference'. 

Traceback (most recent call last):
  File "/home/sequencing/andpetzo/miniconda3/ngs10/envs/Crispresso2/lib/python2.7/site-packages/CRISPResso2/CRISPRessoCORE.py", line 809, in main
    this_exon_end = s1inds[exon_interval_end]
IndexError: list index out of range
CRITICAL @ Sat, 13 Jul 2019 13:20:22:
	 Traceback (most recent call last):
  File "/home/sequencing/andpetzo/miniconda3/ngs10/envs/Crispresso2/lib/python2.7/site-packages/CRISPResso2/CRISPRessoCORE.py", line 809, in main
    this_exon_end = s1inds[exon_interval_end]
IndexError: list index out of range
 

CRITICAL @ Sat, 13 Jul 2019 13:20:22:
	 Unexpected error, please check your input.

ERROR: list index out of range 

I have checked the position in the code where it crashes and it looks like exon_interval_end exceeds the length of s1inds. I guess that when inferring the coding sequence positions of the amplicon after hdr from the original amplicon, CRISPResso does not like the fact that the coding sequence covers the last 100bp :-) ?

When I extend the original amplicon by one base at each side, CRISPResso works fine.

Many thanks,

Andreas

Insertions longer than the remaining reference sequence are missed

Describe the bug
Amplicon sequences containing long insertions at the end are classified as "UNMODIFIED". A more detailed investigation indicated that this happens as soon as an insertion is longer than the remaining reference sequence. For example say you have a 8 bp reference sequence and an aligned sequence with a 4 bp insertion after position 4 (1-indexed) on ref then this amplicon will be correctly identified as "MODIFIED" but if it is a 5 bp insertions it will be falsely classified as "UNMODIFIED". The problem seems to be with the function CRISPRessoCOREResources.find_indels_substitutions (see minimal example below). I guess something with the handling of the gaps on the reference sequence.

Expected behavior
Long insertions longer than the remaining reference sequence should not be missed.

To reproduce
A minimal example from within CRISPResso2:

ipdb> CRISPRessoCOREResources.find_indels_substitutions("AAAGCCCCGTTT", "AAAG----GTTT", np.array([0,1,2,3,4,5,6,7]))['insertion_n']
4
ipdb> CRISPRessoCOREResources.find_indels_substitutions("AAAGCCCCCGTTT", "AAAG-----GTTT", np.array([0,1,2,3,4,5,6,7]))['insertion_n']
0.0

CRISPRessoPooled get empty results

I had
I‘d used CRISPRessoPooled for CRISPR analysis of 7000 amplicons. but get the following error:

merge reads, mapping and the intermediate bam file looks fine, but something wrong with samtools, and then most of intermediate *.fastq.gz was empty,with several *.fastq.gz have one or two reads. I had run with docker image with the soft singularity. The following are the error logs:

49352833 reads; of these:
49352833 (100.00%) were unpaired; of these:
492198 (1.00%) aligned 0 times
48860635 (99.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
99.00% overall alignment rate
samtools view: writing to standard output failed: Broken pipe
samtools view: error closing standard output: -1
Demultiplex reads and run CRISPResso on each amplicon...

Processing:1
Skipping amplicon [1] because no reads align to it

Processing:2
Skipping amplicon [2] because no reads align to it

window size quantification clarification

Hi, I've noticed a discrepancy in the window quantification information, where different default window sizes are noted. I have CRISPResso 2.025 installed and it says default window size is 1. However in the online manual it says default window size is 2 bp (1 upstream, 1 downstream). I have used crispresso 1 previously and remember total window size in bp was double the inputted value (2 bp window size would be 4 bp total, 2 on each side). In crispresso 2 it looks like this parameter was changed. This is important since we would like to keep the same window sizes for our analyzes that were run before crispesso 2 was released.
I am thinking the --help documentation is incorrect, is this correct?

When looking at CRISPResso --help
[CRISPresso version 2.0.25]
-w QUANTIFICATION_WINDOW_SIZE ............... (default: 1)

In the crispresso 2 online manual:
-w or --quantification_window_size ............... Default size is 2bp, which is 1bp up- and 1bp downstream from the quantification window center. (default: 2)

Any clarification is much appreciated, Thanks!
Brandon

CRISPRessoWGS tutorial

Dear,
Could you present a detailed CRISPRessoWGS tutorial, including the fastq data and the chunk of code ? Or give an explanation on the tests directory.

ERROR: Flash failed to run, please check the log file.

Hello
I am trying to run the CRISPRessoPooled command on the pooled deep sequencing data.
i run my script several times as per suggestion, but got an error "ERROR: Flash failed to run, please check the log file"
here is the script that i run
(base) bilal@ubuntu:~/workshop/pooled samples$ CRISPRessoPooled -r1 6297_1_1.fastq.rar -r2 6297_1_2.fastq.rar -f support_file.txt
Flash Package is already installed.

`~~~CRISPRessoPooled~~~
-Analysis of CRISPR/Cas9 outcomes from POOLED deep sequencing data-

           _                                                 _              
          '  )                                              '  )            
          .-'          _______________________              .-'             
         (____         | __  __  __     __ __  |           (____            
      C)|     \        ||__)/  \/  \|  |_ |  \ |        C)|     \           
        \     /        ||   \__/\__/|__|__|__/ |          \     /           
         \___/         |_______________________|           \___/            

                      [CRISPresso version 2.0.27]                           
                [Kendell Clement and Luca Pinello 2019]                     
             [For support contact [email protected]]                 

INFO @ Sat, 27 Apr 2019 07:20:48:
Checking dependencies...

INFO @ Sat, 27 Apr 2019 07:20:48:

All the required dependencies are present!

INFO @ Sat, 27 Apr 2019 07:20:48:
Only the Amplicon description file was provided. The analysis will be perfomed using only the provided amplicons sequences.

INFO @ Sat, 27 Apr 2019 07:20:48:
Creating Folder CRISPRessoPooled_on_6297_1_1.rar_6297_1_2.rar

WARNING @ Sat, 27 Apr 2019 07:20:48:
Folder CRISPRessoPooled_on_6297_1_1.rar_6297_1_2.rar already exists.

INFO @ Sat, 27 Apr 2019 07:20:48:
Merging paired sequences with Flash...

CRITICAL @ Sat, 27 Apr 2019 07:20:48:

ERROR: Flash failed to run, please check the log file.
`

the log file is attached
CRISPRessoPooled_RUNNING_LOG.txt
Kindly help me

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.