Giter VIP home page Giter VIP logo

selftarget's Introduction

SelfTarget

Docker Repository on Quay.io

Scripts for processing and predicting CRISPR/Cas9-generated mutations

FORECasT Web server

To predict and view mutational profiles for individual gRNAs, please visit the FORECasT website at:

https://partslab.sanger.ac.uk/FORECasT

Precomputed FORECasT Results for Human and Mouse CCDS

Precomputed profiles for all gRNAs in human and mouse CCDS regions are available here:

https://fa9.cog.sanger.ac.uk/index.html

Entries are collected into all gRNAs corresponding to each CCDS id. Within each file ending in _predicted_mapped_indel_summary.txt, the entries for each gRNA are separated by a line with

@@@id guide_seq predicted_in_frame 

where the id contains the CCDS id, the chomosome coordinates and the strand. The next line is '- - 1000' and can be ignored (there for visualization only). The following lines are the particular indels predicted and their predicted counts (assuming total reads of 1000, and ignoring indels with less than 1 read). For the read sequences, see corrresponding entries in the _predicted_rep_reads.txt files.

FORECasT Command line tool

  1. Follow the installation instructions here.

  2. After installation, from a command line:

cd indel_prediction
cd predictor
  1. Run single or batch prediction as described next.

Single gRNA prediction

python FORECasT.py <target DNA sequence> <PAM index (0 based)> <output_file_prefix>

e.g.

python FORECasT.py ATGCTAGCTAGGGCATGAGGCATGCTAGTGACTGCATGGTAC 17 test_output

Output will be in

<output_file_prefix>_predictedindelsummary.txt

A list of predicted mutations, one per line, listed in order of decreasing predicted counts. Each line contains an identifier string for the indel followed by a - (ignore this), and then a predicted read count (tab-delimited).

e.g.

-	-	1000	(always 1000 reads - it is the original template sequence - here for viewer use).
D2_L-3R0	-	550
I1_L-2C1R0	-	200

<output_file_prefix>_predictedreads.txt A list of read sequences corresponding to each predicted mutation in the previous file. The format is read_id (ignore this), read sequence, mutation identifier (tab delimited), followed by a - (ignore this)

e.g.

ATGCTAGCTAGGGCATGAGGCATGCTAGTGACTGCATGGTAC	-	-
ATGCTAGCTAGGGCAAGGCATGCTAGTGACTGCATGGTAC	D2_L-3R0	-
ATGCTAGCTAGGGCATGGAGGCATGCTAGTGACTGCATGGTAC	I1_L-2C1R0	-

Batch mode prediction

python FORECasT.py <batch_filename> <output_file_prefix>

e.g.

python FORECasT.py example_batch.txt test_batch_output

where batch_filename is a tab-delimited file with columns: ID, Target, PAM Index e.g.

ID	Target	PAM Index
Guide_1	ATGCTAGCTAGGGCATGAGGCATGCTAGTGACTGCATGGTAC	17
Guide_2	ATCGATGACTGATCGTAGCTAGCTGGGATGCTAGCTAGTTGCATGCTAGGAGTCAGCTAG	23
Guide_3	GATAGTCGTAGGCTAGCTAGCTAGCTGGCAAGTGTGGAAAAGGGGATGCATGTA	26

Output will be in <output_file_prefix>_predictedindelsummary.txt and <output_file_prefix>_predictedreads.txt

which are formatted as for single mode, but separate guides are prefaced by a line with

@@@<ID> <predicted_in_frame>

where ID is the identifier provided for the guide in the batch file, and predicted_in_frame is the predicted percentage of in-frame mutations (i.e. all insertions or deletions that are of size 3,6,9...etc)

Installation

Locally

Create a Python 3 virtual environment and activate it

# install Python dependencies

pip install -r requirements.txt
cd selftarget_pyutils
pip install -e .
cd ../indel_prediction
pip install -e .

# compile predictor

cd indel_analysis/indelmap
cmake . -DINDELMAP_OUTPUT_DIR=/usr/local/bin
make && make install
export INDELGENTARGET_EXE=/usr/local/bin/indelgentarget

Docker

Alternatively, you can start a Docker container and exec into it:

docker pull quay.io/felicityallen/selftarget
docker exec -it quay.io/felicityallen/selftarget bash

Web service

Installation

The predictor can be run as a web service. It can be accessed through a separate front end application FORECasT (source on GitHub). SelfTarget repository contains a Flask server with two API endpoints that are used by FORECasT to access predictor.

To run predictor as a server, you can follow the local installation steps above, go to the root directory and launch

python server/server.py --port=5001

or simply run a Docker container

docker run -d --name selftarget -p 5001:8006 quay.io/felicityallen/selftarget
Development

All changes to the server must be reflected in swagger.yaml since it's being used to automatically generate clients for other services. Tests use it as well, so generally any unreflected changes must fail some of the tests. It is handy to validate swagger specification with swagger validate swagger.yml

selftarget's People

Contributors

anton-khodak avatar felicityallen avatar felicity-exed avatar

Stargazers

 avatar Yaniv Shmueli avatar Dmitry Pustoshilov avatar  avatar  avatar Can Firtina avatar Hakan Dimdik avatar Daniel O'Hanlon avatar WeiWenjie avatar Jean-Philippe Fortin avatar Rich Stoner avatar Michael Lin avatar Josh Wolff avatar Kasra Kamal avatar  avatar  avatar  avatar Fabio avatar Yanhua Zheng avatar Xiaofei Zeng avatar Hu Jiang avatar Peter DeWeirdt avatar  avatar John C. Thomas avatar Yue Han avatar Soh Ishiguro avatar

Watchers

Prete avatar Peter DeWeirdt avatar  avatar Leonie JW avatar  avatar

selftarget's Issues

example_batch.txt - issue with Guide #3

Hi Felicity,

I've been testing FORECasT at the command line using the provided example_batch.txt file. Small error - for Guide #3 I think the PAM index is incorrectly labelled as 26 (forcing an failure in the run) when it should be 25.

Great tool! Thanks for developing :)

Best of luck,

Kris

OSError: [Errno 36] File name too long

Hi @felicityallen,

I have the following issue when running FORECasT in Docker.

Traceback (most recent call last): File "FORECasT.py", line 29, in <module> predictMutationsSingle(target_seq, pam_idx, output_prefix) File "/usr/local/lib/python3.6/site-packages/predictor/predict.py", line 138, in predictMutationsSingle p_predict, rep_reads, in_frame_perc = predictMutations(theta_file, target_seq, pam_idx) File "/usr/local/lib/python3.6/site-packages/predictor/predict.py", line 51, in predictMutations rep_reads = fetchRepReads(tmp_genindels_file) File "/usr/local/lib/python3.6/site-packages/predictor/predict.py", line 23, in fetchRepReads f = io.open(genindels_file) OSError: [Errno 36] File name too long: 'tmp_genindels_CACTCACGCACACTCGTACTGAGACTCAAGGCCGTCTCCACAACTCCAACCAGTGCAAATGACTTAGTGCAAATTAAATTCAGAAGGGACGGGGGAAACAGAGTCGTGGAGGCTTTGAATCTCTCAGAAAAAAGGAAAGACAGGAAAGCTCAGAAACAAAGAGACAGAAGGATGAAAAAGAAGAAGAGGGAGGTGGTGGGGACGGCGTCATCCCGCTGGAGGAGCTCAGCTCTGGGATGATGTGGTGGCTGGTGGTCAACCGTCCGCCGCAGGGGGTGGCCATGAAGATGGAGTCGCCGGTGCGGGGTGGGTGCTGCGGGCGCTGCTGTTCCGATGGTGTCTTTGATGTTGGGCTGATGAGGTCTGGTTCCTCTAGCTTCACCTAGAGATAGCGACACGTGGGTGGGATGGGGGCAGGGTGCTCGGGGGCCTGGAGGCTTCCGGAAGGAGCGCTGGTGCTCACCTTCCAGAGCCGATTCCTGAGTCAGGTAGTGCAGTGGTTGTAAAGTGCAGCATATTCATT_22081.txt'

I used the following command:

python FORECasT.py CACTCACGCACACTCGTACTGAGACTCAAGGCCGTCTCCACAACTCCAACCAGTGCAAATGACTTAGTGCAAATTAAATTCAGAAGGGACGGGGGAAACAGAGTCGTGGAGGCTTTGAATCTCTCAGAAAAAAGGAAAGACAGGAAAGCTCAGAAACAAAGAGACAGAAGGATGAAAAAGAAGAAGAGGGAGGTGGTGGGGACGGCGTCATCCCGCTGGAGGAGCTCAGCTCTGGGATGATGTGGTGGCTGGTGGTCAACCGTCCGCCGCAGGGGGTGGCCATGAAGATGGAGTCGCCGGTGCGGGGTGGGTGCTGCGGGCGCTGCTGTTCCGATGGTGTCTTTGATGTTGGGCTGATGAGGTCTGGTTCCTCTAGCTTCACCTAGAGATAGCGACACGTGGGTGGGATGGGGGCAGGGTGCTCGGGGGCCTGGAGGCTTCCGGAAGGAGCGCTGGTGCTCACCTTCCAGAGCCGATTCCTGAGTCAGGTAGTGCAGTGGTTGTAAAGTGCAGCATATTCATT 270 .

It may not be a good idea to concatenate the sequence itself as filename as for large target sequences this becomes an issue.

Inconsistent indels generated by Indelgen when using REVERSE sequences

Hi @felicityallen,

There seems to be an issue with indelgen in that the potential indel profiles that are generated are not consistent with one between a sequence and it's reverse complement.

Consider the following example:

>Oligo2_GGTTGAAAGTCTATAGTGGT 49 REVERSE
AAATGCGTAACAAAAACAAGCTCGGTATCTGGCCTATTCCACGCCACCCACCACTATAGACTTTCAACATTGTATTGTC

When we run indelgen on this sequence, we get the following possible inserts:

I1_L-1C2R2	
I1_L0R1	
I2_L-1C1R1	
I2_L-1C2R2	
I2_L-3C3R1	
I2_L0C1R2	
I2_L0C3R4	
I2_L0R1	

If we take the reverse complement of Oligo_2, and take the PAM location as 79 (sequence length) - 49 = 30, as below

>Oligo2_GGTTGAAAGTCTATAGTGGT_FORWARD 30 FORWARD
GACAATACAATGTTGAAAGTCTATAGTGGTGGGTGGCGTGGAATAGGCCAGATACCGAGCTTGTTTTTGTTACGCATTT

we get a different set of indels:

I1_L-1C2R2	
I1_L-1R0	2	
I1_L-2C1R0	
I2_L-1C1R1	
I2_L-1C2R2	
I2_L-1R0	9	
I2_L-2C1R0	
I2_L-3C3R1	

I have tried this for multiple examples. I would expect the set of returned indels to be the same. Maybe you know what the issue is here? I am not familiar with C++ myself, so it will take me a while to familiarise myself enough to be able to debug this problem

Wrong file path in predict.py

Dear @felicityallen ,

I could not make FORECasT run locally until I have noticed a typo in line 17 of predict.py:

INDELGENTARGET_EXE = os.getenv("INDELGENTARGET_EXE", 
     "C:/Users/fa9/postdoc/indelmap/build/Release/indelgentarget.exe")

I guess the path to the indelgentarget executable should be the same as in the installation instruction, i.e. /usr/local/bin/indelgentarget. After I changed that, the toll started working. =)

Vlada

Identification of inserted/deleted bases on tag name

Hello Felicity!

So I have been utilizing FORECasT from the command line, but was wondering, is there any way to reconstruct the identity of the inserted/deleted bases without realigning the predicted read to original input? I saw that you have an extended tag name in the generated indel summary file, but can't seem to make heads or tails if the tag name has the information on which indices to extract to get this information.

For example I have the following:

-	-	1000
D7_L-9C3R2	-	121

Which corresponds to the following alignment

AAATTCCAGACAAGTTTGTTGTAGGATATGCCCTTGACTATAATGAATACTTCAGGGATTTGAATGTAAGTAATTGCTTCTTTTTCTCACTCATTTTTCA
AAATTCCAGACAAGTTTGTTGTAGGATATGCCCTTGACTATA-------CTTCAGGGATTTGAATGTAAGTAATTGCTTCTTTTTCTCACTCATTTTTCA

Could you shed some light on whether it is possible to use this tag name to extract indices of the deleted bases? If not, is there any reasoning to how this tag name is generated (besides the I#/D# portion)?

Thanks,
Gavin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.