Giter VIP home page Giter VIP logo

crispr_dart's Introduction

logo

crispr-DART (Downstream Analysis and Reporting Tool)

crispr-DART is a pipeline to process, analyse, and report about the CRISPR-Cas9 induced genome editing outcomes from high-throughput sequencing of target regions of interest.

crispr-DART has been developed as part of the study "Parallel genetics of regulatory sequences using scalable genome editing in vivo" and is now published at Cell Reports: Froehlich, J. & Uyar, B. et al, Cell Reports, 2021.

Here is also the news coverage of our story: Scaling up genome editing big in tiny worms.

Pipeline scheme

The pipeline allows single/paired-end Illumina reads or long PacBio reads from both DNA and RNA samples.

The pipeline consists of the following steps:

  • Quality control (fastqc/multiqc) and improvement (TrimGalore!) of raw reads
  • Mapping the reads to the genome of interest (BBMap)
  • Extracting statistics about the detected insertions and deletions (various R libraries including GenomicAlignments and RSamtools)
  • Reporting of the editing outcomes in interactive reports organized into a website. (rmarkdown::render_site)

pipeline

Example HTML report output

The HTML reports produced by the pipeline are automatically organised as a website. Example report website can be browsed here

Example screenshots from the reports

You can find below some example screenshots from the HTML reports:

screenshots

Installation

  1. Download the source code:
> git clone https://github.com/BIMSBbioinfo/crispr_DART.git
  1. Create a guix profile with dependencies
> mkdir -p $HOME/guix-profiles/crispr_dart
> guix package --manifest=guix.scm --profile=$HOME/guix-profiles/crispr_dart

# activate env
> source ~/guix-profiles/crispr_dart/etc/profile
  1. Test the installation on sample data
> snakemake -s snakefile.py --configfile sample_data/settings.yaml --cores 4 --printshellcmds

How to run the pipeline

Preparing the input files

The pipeline currently requires four different input files.

  1. A sample sheet file, which describes the samples, associated fastq files, the sets of sgRNAs used in the sample and the list of regions of interest.

Please see the example sample sheet file under sample_data/sample_sheet.csv.

  1. A BED file containing the genomic coordinates of all the sgRNAs used in this project.

Please see the example BED file for sgRNA target sites under sample_data/cut_sites.bed

  1. A comparisons table, which is used for comparing pairs of samples in terms of genome editing outcomes.

Please see the example table under sample_data/comparisons.tsv

  1. A settings file, which combines all the information from the other input files and additional configurations for resource requirements of tools.

Please see the example file under sample_data/settings.yaml

The sample_data/fasta folder contains fasta format sequence files that are used as the target genome sequence. The sample_data/reads folder contains sample read files (fastq.gz files from Illumina and PacBio sequenced samples).

Running the pipeline

Once the settings.yaml file is configured with paths to all the other required files, the pipeline can simply be run using the bash script run.sh requesting 2 cpus.

> snakemake -s snakefile.py --configfile */path/to/settings.yaml* --cores 4 --printshellcmds

If you would like to do a dry-run, meaning that the list of jobs are created but not executed, you can do

> snakemake -s snakefile.py --configfile */path/to/settings.yaml* --cores 4 --dryrun --printshellcmds

How to cite

See the publication on Cell Reports

Credits

The software has been developed by Bora Uyar from the Akalin Lab with significant conceptual contributions by Jonathan Froehlich from the N.Rajewsky Lab at the Berlin Institute of Medical Systems Biology of the Max-Delbruck-Center for Molecular Medicine.

crispr_dart's People

Contributors

borauyar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crispr_dart's Issues

Error in test.sh

During bbmap_indexgenome step in test.sh, got below error.
Can you please check?

Finished job 5.
6 of 48 steps (12%) done
ImproperOutputException in line 189 of /data/gpfs/assoc/pgl/data/Dylan/potato_cas9/crispr_DART/snakefile.py:
Outputs of incorrect type (directories when expecting files or vice versa). Output directories must be flagged with directory(). for rule bbmap_indexgenome:
/data/gpfs/assoc/pgl/data/Dylan/potato_cas9/crispr_DART/output_test/bbmap_indexes/chrI_II
  File "/data/gpfs/home/wyim/scratch/bin/miniconda3/envs/crispr_dart/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 581, in handle_job_success
  File "/data/gpfs/home/wyim/scratch/bin/miniconda3/envs/crispr_dart/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 259, in handle_job_success
Removing output files of failed job bbmap_indexgenome since they might be corrupted:
/data/gpfs/assoc/pgl/data/Dylan/potato_cas9/crispr_DART/output_test/bbmap_indexes/chrI_II
Skipped removing non-empty directory /data/gpfs/assoc/pgl/data/Dylan/potato_cas9/crispr_DART/output_test/bbmap_indexes/chrI_II
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /data/gpfs/assoc/pgl/data/Dylan/potato_cas9/crispr_DART/.snakemake/log/2023-06-08T143842.546646.snakemake.log

Thanks.

add aligner options in settings file

Currently, the pipeline uses bbmap for pacbio and illumina read alignments. I'd like to add other options such as needleman/wunsch global aligner and the user decides which one to use.

alignment trimming for low coverage regions

add an option to the settings file that will be used to trim off alignments from low coverage 5' and 3' sides of the amplicon, to avoid looking for indels at the low coverage regions. This trimming threshold can be useful in certain situtations where the target amplicon is not completely amplified in some samples.

issue with running Crispr_DART pipeline

Hi,
I tried to install Crispr_DART tool and its related dependency programs. I also downloaded separately GATK-v3.8-0-ge9d806836 and extracted the zip folder after which i got only GenomeAnalysisTK.jar file inside the folder. Next, I gave path for java and gatk in settings.yaml file along with all input files location as well as output directory. Finally when I tried to execute the program. It did not output any result files in the output gatk subfolder created inside the parent output folder Crispr_DART. When I checked the log files generated for gatk step, it was showing following error.

ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/data/ngs/programs/crispr_DART/tool_gatk_3.8/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

Please help me to resolve the issue and generate complete result files.

Thanks
Nihar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.