felicityallen / selftarget Goto Github PK

View Code? Open in Web Editor NEW

26.0 5.0 13.0 1.34 MB

Scripts for processing and predicting CRISPR/Cas9-generated mutations

License: MIT License

Python 83.81% Shell 0.46% CMake 0.40% C++ 15.17% Dockerfile 0.15%

selftarget's Introduction

SelfTarget

Scripts for processing and predicting CRISPR/Cas9-generated mutations

FORECasT Web server

To predict and view mutational profiles for individual gRNAs, please visit the FORECasT website at:

https://partslab.sanger.ac.uk/FORECasT

Precomputed FORECasT Results for Human and Mouse CCDS

Precomputed profiles for all gRNAs in human and mouse CCDS regions are available here:

https://fa9.cog.sanger.ac.uk/index.html

Entries are collected into all gRNAs corresponding to each CCDS id. Within each file ending in _predicted_mapped_indel_summary.txt, the entries for each gRNA are separated by a line with

@@@id guide_seq predicted_in_frame

where the id contains the CCDS id, the chomosome coordinates and the strand. The next line is '- - 1000' and can be ignored (there for visualization only). The following lines are the particular indels predicted and their predicted counts (assuming total reads of 1000, and ignoring indels with less than 1 read). For the read sequences, see corrresponding entries in the _predicted_rep_reads.txt files.

FORECasT Command line tool

Follow the installation instructions here.
After installation, from a command line:

cd indel_prediction
cd predictor

Run single or batch prediction as described next.

Single gRNA prediction

python FORECasT.py <target DNA sequence> <PAM index (0 based)> <output_file_prefix>

e.g.

python FORECasT.py ATGCTAGCTAGGGCATGAGGCATGCTAGTGACTGCATGGTAC 17 test_output

Output will be in

<output_file_prefix>_predictedindelsummary.txt

A list of predicted mutations, one per line, listed in order of decreasing predicted counts. Each line contains an identifier string for the indel followed by a - (ignore this), and then a predicted read count (tab-delimited).

e.g.

-	-	1000	(always 1000 reads - it is the original template sequence - here for viewer use).
D2_L-3R0	-	550
I1_L-2C1R0	-	200

<output_file_prefix>_predictedreads.txt A list of read sequences corresponding to each predicted mutation in the previous file. The format is read_id (ignore this), read sequence, mutation identifier (tab delimited), followed by a - (ignore this)

e.g.

ATGCTAGCTAGGGCATGAGGCATGCTAGTGACTGCATGGTAC	-	-
ATGCTAGCTAGGGCAAGGCATGCTAGTGACTGCATGGTAC	D2_L-3R0	-
ATGCTAGCTAGGGCATGGAGGCATGCTAGTGACTGCATGGTAC	I1_L-2C1R0	-

Batch mode prediction

python FORECasT.py <batch_filename> <output_file_prefix>

e.g.

python FORECasT.py example_batch.txt test_batch_output

where batch_filename is a tab-delimited file with columns: ID, Target, PAM Index e.g.

ID	Target	PAM Index
Guide_1	ATGCTAGCTAGGGCATGAGGCATGCTAGTGACTGCATGGTAC	17
Guide_2	ATCGATGACTGATCGTAGCTAGCTGGGATGCTAGCTAGTTGCATGCTAGGAGTCAGCTAG	23
Guide_3	GATAGTCGTAGGCTAGCTAGCTAGCTGGCAAGTGTGGAAAAGGGGATGCATGTA	26

Output will be in <output_file_prefix>_predictedindelsummary.txt and <output_file_prefix>_predictedreads.txt

which are formatted as for single mode, but separate guides are prefaced by a line with

@@@<ID> <predicted_in_frame>

where ID is the identifier provided for the guide in the batch file, and predicted_in_frame is the predicted percentage of in-frame mutations (i.e. all insertions or deletions that are of size 3,6,9...etc)

Installation

Locally

Create a Python 3 virtual environment and activate it

# install Python dependencies

pip install -r requirements.txt
cd selftarget_pyutils
pip install -e .
cd ../indel_prediction
pip install -e .

# compile predictor

cd indel_analysis/indelmap
cmake . -DINDELMAP_OUTPUT_DIR=/usr/local/bin
make && make install
export INDELGENTARGET_EXE=/usr/local/bin/indelgentarget

Docker

Alternatively, you can start a Docker container and exec into it:

docker pull quay.io/felicityallen/selftarget
docker exec -it quay.io/felicityallen/selftarget bash

Web service

Installation

The predictor can be run as a web service. It can be accessed through a separate front end application FORECasT (source on GitHub). SelfTarget repository contains a Flask server with two API endpoints that are used by FORECasT to access predictor.

To run predictor as a server, you can follow the local installation steps above, go to the root directory and launch

python server/server.py --port=5001

or simply run a Docker container

docker run -d --name selftarget -p 5001:8006 quay.io/felicityallen/selftarget

Development

All changes to the server must be reflected in swagger.yaml since it's being used to automatically generate clients for other services. Tests use it as well, so generally any unreflected changes must fail some of the tests. It is handy to validate swagger specification with swagger validate swagger.yml

selftarget's People

Contributors

Stargazers

Watchers

Forkers

tw7649116 jchenpku wave-wu shantaol kaskamal shanring jfortin1 pluriscient michaelzh24 ozwzo jaison75 jeffdbeats bioswarm

selftarget's Issues

example_batch.txt - issue with Guide #3

Hi Felicity,

I've been testing FORECasT at the command line using the provided example_batch.txt file. Small error - for Guide #3 I think the PAM index is incorrectly labelled as 26 (forcing an failure in the run) when it should be 25.

Great tool! Thanks for developing :)

Best of luck,

Kris

OSError: [Errno 36] File name too long

Hi @felicityallen,

I have the following issue when running FORECasT in Docker.

Traceback (most recent call last): File "FORECasT.py", line 29, in <module> predictMutationsSingle(target_seq, pam_idx, output_prefix) File "/usr/local/lib/python3.6/site-packages/predictor/predict.py", line 138, in predictMutationsSingle p_predict, rep_reads, in_frame_perc = predictMutations(theta_file, target_seq, pam_idx) File "/usr/local/lib/python3.6/site-packages/predictor/predict.py", line 51, in predictMutations rep_reads = fetchRepReads(tmp_genindels_file) File "/usr/local/lib/python3.6/site-packages/predictor/predict.py", line 23, in fetchRepReads f = io.open(genindels_file) OSError: [Errno 36] File name too long: 'tmp_genindels_CACTCACGCACACTCGTACTGAGACTCAAGGCCGTCTCCACAACTCCAACCAGTGCAAATGACTTAGTGCAAATTAAATTCAGAAGGGACGGGGGAAACAGAGTCGTGGAGGCTTTGAATCTCTCAGAAAAAAGGAAAGACAGGAAAGCTCAGAAACAAAGAGACAGAAGGATGAAAAAGAAGAAGAGGGAGGTGGTGGGGACGGCGTCATCCCGCTGGAGGAGCTCAGCTCTGGGATGATGTGGTGGCTGGTGGTCAACCGTCCGCCGCAGGGGGTGGCCATGAAGATGGAGTCGCCGGTGCGGGGTGGGTGCTGCGGGCGCTGCTGTTCCGATGGTGTCTTTGATGTTGGGCTGATGAGGTCTGGTTCCTCTAGCTTCACCTAGAGATAGCGACACGTGGGTGGGATGGGGGCAGGGTGCTCGGGGGCCTGGAGGCTTCCGGAAGGAGCGCTGGTGCTCACCTTCCAGAGCCGATTCCTGAGTCAGGTAGTGCAGTGGTTGTAAAGTGCAGCATATTCATT_22081.txt'

I used the following command:

python FORECasT.py CACTCACGCACACTCGTACTGAGACTCAAGGCCGTCTCCACAACTCCAACCAGTGCAAATGACTTAGTGCAAATTAAATTCAGAAGGGACGGGGGAAACAGAGTCGTGGAGGCTTTGAATCTCTCAGAAAAAAGGAAAGACAGGAAAGCTCAGAAACAAAGAGACAGAAGGATGAAAAAGAAGAAGAGGGAGGTGGTGGGGACGGCGTCATCCCGCTGGAGGAGCTCAGCTCTGGGATGATGTGGTGGCTGGTGGTCAACCGTCCGCCGCAGGGGGTGGCCATGAAGATGGAGTCGCCGGTGCGGGGTGGGTGCTGCGGGCGCTGCTGTTCCGATGGTGTCTTTGATGTTGGGCTGATGAGGTCTGGTTCCTCTAGCTTCACCTAGAGATAGCGACACGTGGGTGGGATGGGGGCAGGGTGCTCGGGGGCCTGGAGGCTTCCGGAAGGAGCGCTGGTGCTCACCTTCCAGAGCCGATTCCTGAGTCAGGTAGTGCAGTGGTTGTAAAGTGCAGCATATTCATT 270 .

It may not be a good idea to concatenate the sequence itself as filename as for large target sequences this becomes an issue.

Inconsistent indels generated by Indelgen when using REVERSE sequences

Hi @felicityallen,

There seems to be an issue with indelgen in that the potential indel profiles that are generated are not consistent with one between a sequence and it's reverse complement.

Consider the following example:

>Oligo2_GGTTGAAAGTCTATAGTGGT 49 REVERSE
AAATGCGTAACAAAAACAAGCTCGGTATCTGGCCTATTCCACGCCACCCACCACTATAGACTTTCAACATTGTATTGTC

When we run indelgen on this sequence, we get the following possible inserts:

I1_L-1C2R2	
I1_L0R1	
I2_L-1C1R1	
I2_L-1C2R2	
I2_L-3C3R1	
I2_L0C1R2	
I2_L0C3R4	
I2_L0R1

If we take the reverse complement of Oligo_2, and take the PAM location as 79 (sequence length) - 49 = 30, as below

>Oligo2_GGTTGAAAGTCTATAGTGGT_FORWARD 30 FORWARD
GACAATACAATGTTGAAAGTCTATAGTGGTGGGTGGCGTGGAATAGGCCAGATACCGAGCTTGTTTTTGTTACGCATTT

we get a different set of indels:

I1_L-1C2R2	
I1_L-1R0	2	
I1_L-2C1R0	
I2_L-1C1R1	
I2_L-1C2R2	
I2_L-1R0	9	
I2_L-2C1R0	
I2_L-3C3R1

I have tried this for multiple examples. I would expect the set of returned indels to be the same. Maybe you know what the issue is here? I am not familiar with C++ myself, so it will take me a while to familiarise myself enough to be able to debug this problem

Wrong file path in predict.py

Dear @felicityallen ,

I could not make FORECasT run locally until I have noticed a typo in line 17 of predict.py:

INDELGENTARGET_EXE = os.getenv("INDELGENTARGET_EXE", 
     "C:/Users/fa9/postdoc/indelmap/build/Release/indelgentarget.exe")

I guess the path to the indelgentarget executable should be the same as in the installation instruction, i.e. /usr/local/bin/indelgentarget. After I changed that, the toll started working. =)

Vlada

Identification of inserted/deleted bases on tag name

Hello Felicity!

So I have been utilizing FORECasT from the command line, but was wondering, is there any way to reconstruct the identity of the inserted/deleted bases without realigning the predicted read to original input? I saw that you have an extended tag name in the generated indel summary file, but can't seem to make heads or tails if the tag name has the information on which indices to extract to get this information.

For example I have the following:

-	-	1000
D7_L-9C3R2	-	121

Which corresponds to the following alignment

AAATTCCAGACAAGTTTGTTGTAGGATATGCCCTTGACTATAATGAATACTTCAGGGATTTGAATGTAAGTAATTGCTTCTTTTTCTCACTCATTTTTCA
AAATTCCAGACAAGTTTGTTGTAGGATATGCCCTTGACTATA-------CTTCAGGGATTTGAATGTAAGTAATTGCTTCTTTTTCTCACTCATTTTTCA

Could you shed some light on whether it is possible to use this tag name to extract indices of the deleted bases? If not, is there any reasoning to how this tag name is generated (besides the I#/D# portion)?

Thanks,
Gavin

Error: 502 Bad Gateway

Hi!
I tried to used the online tool on https://partslab.sanger.ac.uk/FORECasT, but it return a blank page with the error message: 502 Bad Gateway
I guess something is wrong somewhere, I used it in the past and it was working well, thanks for this great tool!

felicityallen / selftarget Goto Github PK

selftarget's Introduction

SelfTarget

FORECasT Web server

Precomputed FORECasT Results for Human and Mouse CCDS

FORECasT Command line tool

Single gRNA prediction

Batch mode prediction

Installation

Locally

Docker

Web service

Installation

Development

selftarget's People

Contributors

Stargazers

Watchers

Forkers

selftarget's Issues

Recommend Projects

Recommend Topics

Recommend Org