Giter VIP home page Giter VIP logo

perturbseq_workflow's Introduction

Perturbseq Workflow

Snakemake workflow for generating read count data from CRISPR screens.

Setup

Snakemake

You will need to initialise a conda environment with Snakemake and pandas available to run this workflow, ideally using mamba:

mamba create --name snakemake  'snakemake-minimal>=5.24.1' 'pandas>=1.1'

The necessary environements for the different stages of this workflow will then be created once you run the workflow (see below).

Your Data

Edit the config files config/config.yaml, config/samples.tsv and config/units.tsv according to your needs. See schemas in workflow/schemas.

config/config.yaml

  • samples:

    path to your samples table. Does not need to be edited if you intend to edit config/samples.tsv file instead.

  • units:

    path to your units table. Does not need to be edited if you intend to edit config/units.tsv file instead.

  • guides:

    URL (omitting the http:// prefix) to excel spreadsheet of guide sequences. Currently those in the format of the Toronto knockout libraries are supported (e.g. TKOv3 and mTKO ).

  • control_genotype

    control genotype for synthetic lethal/suppressor gene identification. If provided, a regression analysis of other genotypes will be performed comparing BAGEL and JACKS scores against this genotype. Assumes matching timepoints for each genotype.

  • bagel_zip:

    description: URL of BAGEL github repository.

  • trim:

    Optional parameter to specify the number of nucleotides to trim from the start of each read in your FASTQ files.

  • max_length:

    Optional parameter to specify the length of sequence to keep from your FASTQ files (i.e. if you need to trim from the 3' ends of your sequences).

  • species:

    Used to automatically determine which core essential and negative control gene sets to use. Either human or mouse are supported. Ignored if bagel_ess_genes and bagel_neg_genes are specified. Default = human.

  • bagel_ess_genes:

    Path to text file containing core essential genes relevant to your study. File must contain gene symbols as the first (or only) whitespace separated column.

  • bagel_neg_genes:

    Path to text file containing negative control non-essential genes relevant to your study. File must contain gene symbols as the first (or only) whitespace separated column.

config/samples.tsv

Edit the example file to specify the individual samples and timepoints in your project.

config/units.tsv

Edit the example file to specify the individual run files in your project. A sample may have multiple FASTQ files associated with it distinguished by the 'unit_name' parameter (which may be any arbitrary field as long as it does not contain a hyphen).

Running

Invoke snakemake with a number of cores that suits your hardware. The --use-conda flag will create all the necessary conda environments for the workflow.

$ snakemake --use-conda --cores 4

To run on an SGE cluster, example configurations are provided in cluster-qsub and cluster_config.yaml. The workflow can be run on an SGE cluster with a command like the following:

$ snakemake --profile cluster-qsub --cluster-config cluster_config.yaml --use-conda --cores 8

Test Data

Note that test data are provided in the test directory. Using the default config/samples.tsv and config/units.tsv will run this workflow on these data. The test files are provided to test execution of the workflow but due to the small size of these files some of the plots produced may look a little odd compared to real data.

Author

Written by David A. Parry at the University of Edinburgh.

perturbseq_workflow's People

Contributors

david-a-parry avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.