Giter VIP home page Giter VIP logo

maximilianh / crisporwebsite Goto Github PK

View Code? Open in Web Editor NEW
68.0 68.0 43.0 203.78 MB

All source code of the crispor.org website

Home Page: http://crispor.org

License: Other

Python 4.90% Makefile 4.15% C++ 12.47% C 35.69% Perl 1.42% Shell 1.30% PHP 0.01% HTML 34.88% CSS 0.24% R 0.01% Java 0.34% M4 0.49% Roff 1.67% JavaScript 0.70% AngelScript 0.01% PostScript 1.64% Fortran 0.03% Module Management System 0.02% XS 0.05% DIGITAL Command Language 0.01%

crisporwebsite's People

Contributors

flashlab avatar gregdingle avatar gunjn5147 avatar jpconcordet avatar maximilianh avatar rrohit-12 avatar swetabhpathak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crisporwebsite's Issues

format for custom genome

Dear,
I want to add my research genome to crispor. And I noted the example in Adding a genome section used a repeatMasked genome fasta file.
My question is why not an original assembled genome. And original assembled genome, hardmasked and softmasked, which is best?

Any help is much appreciated.
Thanks.

Best regards

summary

From Jubina:

First of all, the library dependencies of libssl, libpng and libcrypto create error. so I made a symbolic link in order for the program to pick the correct dependencies.

The second one for the dbm (which I already discussed)

and finally, if we are using conda environment we need to change the /ur/local/bin to our conda bin directory.

a few suggestions to facilitate local installation and execution

Hi,
I'm currently using crispor, thanks for sharing the code.

I cloned the repo the 17 septembre 2018 (commit 2d08207)

I ran through a few hurdles during installation/execution (despite a nice README) which I overcame this way:
I used a virtual environment with python 2.7.7, it helped to install the correct versions of the dependencies.
But it can broke the execution of some of the utils tools because of inconsistencies between python 3 (default on my system) and the virtual 2.7
Using #!usr/bin/env python as shebang in all scripts saved the day. I think you could replace them systematically.

BedBetween and bedOverlaperge raised an indentation error, I corrected them in my editor.

in crisprAddGenome, line 991, before invoking ./makeGenomeInfo, the script tries to cd in "/data/www/crispor" which is not necessarly an existing path nor the correct one depending on where the script is stored.
Hope it helps :)

Thanks again, will let you know how it runs

Error when putting the genome onto the ramdisk

Hello, and many thanks for writing CRISPOR!!

I have been able to locally install CRISPOR on an Ubuntu 18.04 OS, in order to run it from the command line. I downloaded the hg38 genome from http://crispor.tefor.net/genomes/hg38anset/, and everything works just fine. The off-target and on-target scores are identical to those from the CRISPOR website.

However, it is rather slow. I tried to follow your advice here on GitHub to put the genome on the RAMdisk. However, I keep encountering the same error:

`
INFO:root:Using bedtools and genome fasta on ramdisk, /dev/shm/hg38anset.fa
index file /dev/shm/hg38anset.fa.fai not found, generating...
Traceback (most recent call last):
File "bin/filterFaToBed", line 182, in
main()
File "bin/filterFaToBed", line 151, in main
if bool(int(isRep)):
ValueError: invalid literal for int() with base 10: '0::chr1:18216-18239(-)'

real 0m2.599s
user 0m1.981s
sys 0m0.565s
ERROR:root:Error: could not run command set -o pipefail; time bedtools getfasta -s -name -fi /dev/shm/hg38anset.fa -bed /tmp/crisporjom1oC/KU3v9oQSzWkIKX81pqP4.matches.bed -fo /dev/stdout | bin/filterFaToBed /tmp/crisporjom1oC/KU3v9oQSzWkIKX81pqP4.fa NGG NAG,NGA 1.0 > /tmp/crisporjom1oC/KU3v9oQSzWkIKX81pqP4.filtMatches.bed.
`

I get this error regardless of whether I use FASTA files or .bed files as the input. Do you have any idea what might be causing this issue?

And do you have any other tips for speeding up the scoring of many thousands of guides (with a .bed file as the input)?

potential bug, hitBestMismCount not updated with the minimal value

looking at the

oldMismCount = hitBestMismCount.get(hitId, 9999)

           oldMismCount = hitBestMismCount.get(hitId, 9999)
           if mismCount < oldMismCount:
               hit = (mismCount, guideIdWithMod, strand, chrom, start, tSeq, modifParts)
               posToHit[hitId] = hit

wasn't hitBestMismCount supposed to be updated with the newly found minimum:

           oldMismCount = hitBestMismCount.get(hitId, 9999)
           if mismCount < oldMismCount:
               hit = (mismCount, guideIdWithMod, strand, chrom, start, tSeq, modifParts)
               posToHit[hitId] = hit
               hitBestMismCount[hitId] = mismCount # bugfix

like it stands now, whichever count happens to be the last wins.

Error writing primers to file

I am trying to generate amplicon primers for guide RNAs. I am running the following command:
python crispor.py hg38 sampleFiles/test/input.fa sampleFiles/test/output.tsv --satMutDir=sampleFiles/test/ -d

and getting the following error. I have looked in the primer3Out.txt file and the primers are there, but they are not getting processed and written properly in the next steps.

DEBUG:root:Running set -o pipefail; /scratch/groups/pritch/jake/crispor/crisporWebsite/bin/Linux/primer3_core /tmp/primer3In.txt > /tmp/primer3Out.txt INFO:root:Writing primers to sampleFiles/test/seq1_ontargetPrimers.tsv DEBUG:root:reading offtargets from /tmp/crisporpLUEKA/FuOwg3SoSc4U3oJnIDkt.bed.gz Traceback (most recent call last): File "crispor.py", line 8266, in <module> main() File "crispor.py", line 8264, in main mainCommandLine() File "crispor.py", line 8109, in mainCommandLine writeOntargetAmpliconFile("primers", batchId, options.ampLen, options.tm, pFh) File "crispor.py", line 5681, in writeOntargetAmpliconFile effScores = allEffScores[pamId] KeyError: 's20+'

[E::bwa_idx_load] fail to locate the index files on custom built genome

Sorry to bother you again. I built a custom genome using the following command and received this output:

sudo ../../tools/crisprAddGenome fasta JoinedScaffold.fasta --desc 'cionaRobustaKH|Ciona robusta|C. robusta|Ghost Joined Scaffolds' --gff KH.KHGene.2012.gff3

 ==== /tmp2/cionaRobustaKH/cionaRobustaKH.sizes exists - not indexing with BWA and not converting to twobit ==== 
moving /tmp2/cionaRobustaKH/cionaRobustaKH.gp to crispr genome dir /var/www/crispor/genomes/cionaRobustaKH
moving /tmp2/cionaRobustaKH/cionaRobustaKH.segments.bed to crispr genome dir /var/www/crispor/genomes/cionaRobustaKH
moving /tmp2/cionaRobustaKH/cionaRobustaKH.sizes to crispr genome dir /var/www/crispor/genomes/cionaRobustaKH
moving /tmp2/cionaRobustaKH/cionaRobustaKH.2bit to crispr genome dir /var/www/crispor/genomes/cionaRobustaKH
wrote /var/www/crispor/genomes/cionaRobustaKH/genomeInfo.tab

I then tried to run crispor and got the a message saying cannot locate index.

~/bioinformatics_software/crisporWebsite/crispor.py cionaRobustaKH KH.test.fasta testOutput.tsv -o offtaget.tsv
INFO:root: * running on sequence 'KH2012:KH.C1.1.v1.A.ND1-1', guideLen=20, seqLen=995
[E::bwa_idx_load] fail to locate the index files
ERROR:root:Error: could not run command set -o pipefail; /Users/elowe3/bioinformatics_software/crisporWebsite/bin/Darwin/bwa bwasw -T 20 /Users/elowe3/bioinformatics_software/crisporWebsite/genomes/cionaRobustaKH/cionaRobustaKH.fa /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crisporBestMatchqQEQg7.fa > /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crisporBestMatchXoRay1.sam.

I thought it was strange that I received the message "not indexing with BWA" when I created the genome, so I ran bwa index and linked the index files to all places I thought could solve the problem:

ls /Users/elowe3/bioinformatics_software/crisporWebsite/genomes/cionaRobustaKH/
JoinedScaffold.fasta	cionaRobustaKH.fa.amb	cionaRobustaKH.fa.bwt	cionaRobustaKH.fa.sa
KH.KHGene.2012.gff3	cionaRobustaKH.fa.ann	cionaRobustaKH.fa.pac

ls /tmp2/cionaRobustaKH/
cionaRobustaKH.fa	cionaRobustaKH.fa.ann	cionaRobustaKH.fa.pac
cionaRobustaKH.fa.amb	cionaRobustaKH.fa.bwt	cionaRobustaKH.fa.sa

 ls /var/www/crispor/genomes/cionaRobustaKH/
cionaRobustaKH.2bit		cionaRobustaKH.fa.pac		cionaRobustaKH.sizes
cionaRobustaKH.fa.amb		cionaRobustaKH.fa.sa		genomeInfo.tab
cionaRobustaKH.fa.ann		cionaRobustaKH.gp
cionaRobustaKH.fa.bwt		cionaRobustaKH.segments.bed

Now I'm stuck...

Error while downloading hg38 data

I am getting following error when I tried to download hg38 database using this command

sudo ./crisprAddGenome ucsc hg38

==== Downloading ensembl transcripts to gene name table ====
INFO:root:Running: wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/ensemblToGeneName.txt.gz -O - | zcat > /tmp/hg38/hg38.ensemblToGeneName.txt
--2018-07-27 17:41:30-- http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/ensemblToGeneName.txt.gz
Resolving hgdownload.cse.ucsc.edu (hgdownload.cse.ucsc.edu)... 128.114.119.163
Connecting to hgdownload.cse.ucsc.edu (hgdownload.cse.ucsc.edu)|128.114.119.163|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2018-07-27 17:41:32 ERROR 404: Not Found.

gzip: stdin: unexpected end of file
Could not run command wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/ensemblToGeneName.txt.gz -O - | zcat > /tmp/hg38/hg38.ensemblToGeneName.txt

UCSC has removed this file from URL.

guides targeting repeat elements assigned high specificity

When using the analysis set human genome (hg38anset), some guides targeting repeat elements get assigned specificity score of 100, while having many many matches in the genome by blat, presumably because repeat elements are masked in hg38anset? For example the following guides targeting L1HS.fa (pasted below). Most of the other guides targeting L1HS get assigned specificity 0, as expected, not sure what distinguishes these.

CACCGCATATTCTCACTCATAGG
TCTTTCCTGCTTTCTCTTGTAGG
GTTTTCTTCTAGGGTTTTTATGG

L1HS
GGGAGGAGGAGCCAAGATGGCCGAATAGGAACAGCTCCGGTCTACAGCTCCCAGCGTGAGCGACGCAGAAGACGGGTGATTTCTGCATTTCCATCTGAGGTACCGGGTTCATCTCACTAGGGAGTGCCAGACAGTGGGCGCAGGCCAGTGTGTGTGCGCACCGTGCGCGAGCCGAAGCAGGGCGAGGCATTGCCTCACCTGGGAAGCGCAAGGGGTCAGGGAGTTCCCTTTCCGAGTCAAAGAAAGGGGTGACGGACGCACCTGGAAAATCGGGTCACTCCCACCCGAATATTGCGCTTTTCAGACCGGCTTAAGAAACGGCGCACCACGAGACTATATCCCACACCTGGCTCAGAGGGTCCTACGCCCACGGAATCTCGCTGATTGCTAGCACAGCAGTCTGAGATCAAACTGCAAGGCGGCAACGAGGCTGGGGGAGGGGCGCCCGCCATTGCCCAGGCTTGCTTAGGTAAACAAAGCAGCCGGGAAGCTCGAACTGGGTGGAGCCCACCACAGCTCAAGGAGGCCTGCCTGCCTCTGTAGGCTCCACCTCTGGGGGCAGGGCACAGACAAACAAAAAGACAGCAGTAACCTCTGCAGACTTAAGTGTCCCTGTCTGACAGCTTTGAAGAGAGCAGTGGTTCTCCCAGCACGCAGCTGGAGATCTGAGAACGGGCAGACTGCCTCCTCAAGTGGGTCCCTGACCCCTGACCCCCGAGCAGCCTAACTGGGAGGCACCCCCCAGCAGGGGCACACTGACACCTCACACGGCAGGGTATTCCAACAGACCTGCAGCTGAGGGTCCTGTCTGTTAGAAGGAAAACTAACAACCAGAAAGGACATCTACACCGAAAACCCATCTGTACATCACCATCATCAAAGACCAAAAGTAGATAAAACCACAAAGATGGGGAAAAAACAGAACAGAAAAACTGGAAACTCTAAAACGCAGAGCGCCTCTCCTCCTCCAAAGGAACGCAGTTCCTCACCAGCAACAGAACAAAGCTGGATGGAGAATGATTTTGACGAGCTGAGAGAAGAAGGCTTCAGACGATCAAATTACTCTGAGCTACGGGAGGACATTCAAACCAAAGGCAAAGAAGTTGAAAACTTTGAAAAAAATTTAGAAGAATGTATAACTAGAATAACCAATACAGAGAAGTGCTTAAAGGAGCTGATGGAGCTGAAAACCAAGGCTCGAGAACTACGTGAAGAATGCAGAAGCCTCAGGAGCCGATGCGATCAACTGGAAGAAAGGGTATCAGCAATGGAAGATGAAATGAATGAAATGAAGCGAGAAGGGAAGTTTAGAGAAAAAAGAATAAAAAGAAATGAGCAAAGCCTCCAAGAAATATGGGACTATGTGAAAAGACCAAATCTACGTCTGATTGGTGTACCTGAAAGTGATGTGGAGAATGGAACCAAGTTGGAAAACACTCTGCAGGATATTATCCAGGAGAACTTCCCCAATCTAGCAAGGCAGGCCAACGTTCAGATTCAGGAAATACAGAGAACGCCACAAAGATACTCCTCGAGAAGAGCAACTCCAAGACACATAATTGTCAGATTCACCAAAGTTGAAATGAAGGAAAAAATGTTAAGGGCAGCCAGAGAGAAAGGTCGGGTTACCCTCAAAGGAAAGCCCATCAGACTAACAGCGGATCTCTCGGCAGAAACCCTACAAGCCAGAAGAGAGTGGGGGCCAATATTCAACATTCTTAAAGAAAAGAATTTTCAACCCAGAATTTCATATCCAGCCAAACTAAGCTTCATAAGTGAAGGAGAAATAAAATACTTTATAGACAAGCAAATGCTGAGAGATTTTGTCACCACCAGGCCTGCCCTAAAAGAGCTCCTGAAGGAAGCGCTAAACATGGAAAGGAACAACCGGTACCAGCCGCTGCAAAATCATGCCAAAATGTAAAGACCATCGAGACTAGGAAGAAACTGCATCAACTAATGAGCAAAATCACCAGCTAACATCATAATGACAGGATCAAATTCACACATAACAATATTAACTTTAAATATAAATGGACTAAATTCTGCAATTAAAAGACACAGACTGGCAAGTTGGATAAAGAGTCAAGACCCATCAGTGTGCTGTATTCAGGAAACCCATCTCACGTGCAGAGACACACATAGGCTCAAAATAAAAGGATGGAGGAAGATCTACCAAGCCAATGGAAAACAAAAAAAGGCAGGGGTTGCAATCCTAGTCTCTGATAAAACAGACTTTAAACCAACAAAGATCAAAAGAGACAAAGAAGGCCATTACATAATGGTAAAGGGATCAATTCAACAAGAGGAGCTAACTATCCTAAATATTTATGCACCCAATACAGGAGCACCCAGATTCATAAAGCAAGTCCTCAGTGACCTACAAAGAGACTTAGACTCCCACACATTAATAATGGGAGACTTTAACACCCCACTGTCAACATTAGACAGATCAACGAGACAGAAAGTCAACAAGGATACCCAGGAATTGAACTCAGCTCTGCACCAAGCAGACCTAATAGACATCTACAGAACTCTCCACCCCAAATCAACAGAATATACATTTTTTTCAGCACCACACCACACCTATTCCAAAATTGACCACATAGTTGGAAGTAAAGCTCTCCTCAGCAAATGTAAAAGAACAGAAATTATAACAAACTATCTCTCAGACCACAGTGCAATCAAACTAGAACTCAGGATTAAGAATCTCACTCAAAGCCGCTCAACTACATGGAAACTGAACAACCTGCTCCTGAATGACTACTGGGTACATAACGAAATGAAGGCAGAAATAAAGATGTTCTTTGAAACCAACGAGAACAAAGACACCACATACCAGAATCTCTGGGACGCATTCAAAGCAGTGTGTAGAGGGAAATTTATAGCACTAAATGCCTACAAGAGAAAGCAGGAAAGATCCAAAATTGACACCCTAACATCACAATTAAAAGAACTAGAAAAGCAAGAGCAAACACATTCAAAAGCTAGCAGAAGGCAAGAAATAACTAAAATCAGAGCAGAACTGAAGGAAATAGAGACACAAAAAACCCTTCAAAAAATCAATGAATCCAGGAGCTGGTTTTTTGAAAGGATCAACAAAATTGATAGACCGCTAGCAAGACTAATAAAGAAAAAAAGAGAGAAGAATCAAATAGACACAATAAAAAATGATAAAGGGGATATCACCACCGATCCCACAGAAATACAAACTACCATCAGAGAATACTACAAACACCTCTACGCAAATAAACTAGAAAATCTAGAAGAAATGGATACATTCCTCGACACATACACTCTCCCAAGACTAAACCAGGAAGAAGTTGAATCTCTGAATAGACCAATAACAGGCTCTGAAATTGTGGCAATAATCAATAGTTTACCAACCAAAAAGAGTCCAGGACCAGATGGATTCACAGCCGAATTCTACCAGAGGTACAAGGAGGAACTGGTACCATTCCTTCTGAAACTATTCCAATCAATAGAAAAAGAGGGAATCCTCCCTAACTCATTTTATGAGGCCAGCATCATTCTGATACCAAAGCCGGGCAGAGACACAACCAAAAAAGAGAATTTTAGACCAATATCCTTGATGAACATTGATGCAAAAATCCTCAATAAAATACTGGCAAACCGAATCCAGCAGCACATCAAAAAGCTTATCCACCATGATCAAGTGGGCTTCATCCCTGGGATGCAAGGCTGGTTCAATATACGCAAATCAATAAATGTAATCCAGCATATAAACAGAGCCAAAGACAAAAACCACATGATTATCTCAATAGATGCAGAAAAAGCCTTTGACAAAATTCAACAACCCTTCATGCTAAAAACTCTCAATAAATTAGGTATTGATGGGACGTATTTCAAAATAATAAGAGCTATCTATGACAAACCCACAGCCAATATCATACTGAATGGGCAAAAACTGGAAGCATTCCCTTTGAAAACTGGCACAAGACAGGGATGCCCTCTCTCACCGCTCCTATTCAACATAGTGTTGGAAGTTCTGGCCAGGGCAATCAGGCAGGAGAAGGAAATAAAGGGTATTCAATTAGGAAAAGAGGAAGTCAAATTGTCCCTGTTTGCAGACGACATGATTGTTTATCTAGAAAACCCCATCGTCTCAGCCCAAAATCTCCTTAAGCTGATAAGCAACTTCAGCAAAGTCTCAGGATACAAAATCAATGTACAAAAATCACAAGCATTCTTATACACCAACAACAGACAAACAGAGAGCCAAATCATGGGTGAACTCCCATTCACAATTGCTTCAAAGAGAATAAAATACCTAGGAATCCAACTTACAAGGGATGTGAAGGACCTCTTCAAGGAGAACTACAAACCACTGCTCAAGGAAATAAAAGAGGACACAAACAAATGGAAGAACATTCCATGCTCATGGGTAGGAAGAATCAATATCGTGAAAATGGCCATACTGCCCAAGGTAATTTACAGATTCAATGCCATCCCCATCAAGCTACCAATGACTTTCTTCACAGAATTGGAAAAAACTACTTTAAAGTTCATATGGAACCAAAAAAGAGCCCGCATCGCCAAGTCAATCCTAAGCCAAAAGAACAAAGCTGGAGGCATCACACTACCTGACTTCAAACTATACTACAAGGCTACAGTAACCAAAACAGCATGGTACTGGTACCAAAACAGAGATATAGATCAATGGAACAGAACAGAGCCCTCAGAAATAATGCCGCATATCTACAACTATCTGATCTTTGACAAACCTGAGAAAAACAAGCAATGGGGAAAGGATTCCCTATTTAATAAATGGTGCTGGGAAAACTGGCTAGCCATATGTAGAAAGCTGAAACTGGATCCCTTCCTTACACCTTATACAAAAATCAATTCAAGATGGATTAAAGATTTAAACGTTAGACCTAAAACCATAAAAACCCTAGAAGAAAACCTAGGCATTACCATTCAGGACATAGGCGTGGGCAAGGACTTCATGTCCAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATTGACAAATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGAGTGAACAGGCAACCTACAACATGGGAGAAAATTTTCGCAACCTACTCATCTGACAAAGGGCTAATATCCAGAATCTACAATGAACTCAAACAAATTTACAAGAAAAAAACAAACAACCCCATCAAAAAGTGGGCGAAGGACATGAACAGACACTTCTCAAAAGAAGACATTTATGCAGCCAAAAAACACATGAAGAAATGCTCATCATCACTGGCCATCAGAGAAATGCAAATCAAAACCACTATGAGATATCATCTCACACCAGTTAGAATGGCAATCATTAAAAAGTCAGGAAACAACAGGTGCTGGAGAGGATGTGGAGAAATAGGAACACTTTTACACTGTTGGTGGGACTGTAAACTAGTTCAACCATTGTGGAAGTCAGTGTGGCGATTCCTCAGGGATCTAGAACTAGAAATACCATTTGACCCAGCCATCCCATTACTGGGTATATACCCAAAGGACTATAAATCATGCTGCTATAAAGACACATGCACACGTATGTTTATTGCGGCACTATTCACAATAGCAAAGACTTGGAACCAACCCAAATGTCCAACAATGATAGACTGGATTAAGAAAATGTGGCACATATACACCATGGAATACTATGCAGCCATAAAAAATGATGAGTTCATATCCTTTGTAGGGACATGGATGAAATTGGAAACCATCATTCTCAGTAAACTATCGCAAGAACAAAAAACCAAACACCGCATATTCTCACTCATAGGTGGGAATTGAACAATGAGATCACATGGACACAGGAAGGGGAATATCACACTCTGGGGACTGTGGTGGGGTCGGGGGAGGGGGGAGGGATAGCATTGGGAGATATACCTAATGCTAGATGACACGTTAGTGGGTGCAGCGCACCAGCATGGCACATGTATACATATGTAACTAACCTGCACAATGTGCACATGTACCCTAAAACTTAGAG

MIT Specificity Score source code

Where in the repository files can one find the code implementing the MIT specificity score? The CRISPOR paper references the original MIT paper and accompanying website, but the paper leaves out the details and the website (http://crispr.mit.edu) is no longer active. Please let me know. Thanks!

Upgrade to Python3?

Do you have plans on upgrading to Python3?
pip says it'll drop support for Python2 six months from now.

Add semantic versioning tags

Could you tag the latest working commit with a semantic version?
I want to checkout and use the latest stable version.

--gap option not present

The documentation says that --gap is to allow gaps in the match. When I try to run with that parameter, I get:
crispor.py: error: no such option: --gap

Is that option to be incorporated in the future?

cli (command line version) of the repo

We are interested in helping create a cli only version of CRISPOR. This will be useful for people who are looking to integrate crispor with an existing workflow or in a webapp. It could also possibly help some advanced users who might want to use crispor programmatically.

Would this be something of interest to the maintainers of this project? Even if not, it would be useful to discuss how we can keep this repo in-sync with our cli version.

Specific target sequence can reproducibly crash the site

First, just wanted to say this is a great tool! I've been working on a side-project of mine, a command line tool for this kind of thing (see github.com/aaronmck/DeepFry). I was looking at some of my negative Moreno-Mateos hits for comparison, and putting the following sequence (my most negative hit) into CRISPOR crashed the site with a '<type 'exceptions.KeyError'>' error:

AAAAATTTTTAAAAATTAGCTGG (with human NGG)

A screenshot is attached. Anyway, just wanted to give you a heads up, and thanks again for aggregating all these scoring methods, it's been incredibly useful!

crash

Errors while trying to run command-line CRISPOR

Hi there! I have been trying to run command-line CRISPOR. Unfortunately it keeps throwing root errors and I haven't been able to generate any guides. This is the code I have been running:
singularity run docker://pinellolab/crispor_crispresso_nat_prot crispor.py hg19 /net/rcstorenfs02/ifs/rc_labs/liau_lab/ceejay/crispor_input.fasta /net/rcstorenfs02/ifs/rc_labs/liau_lab/ceejay/crispor_output.tsv

And this is the output I have been receiving:

Creating container runtime...
Exploding layer: sha256:8ad8b3f87b378cfae583fef34e47a3c9203847d779961b7351cbf786af0bc09f.tar.gz
Exploding layer: sha256:9ceb84f9e6d343cce995c6e223ae3418a87b25a167aba922953341e94f535cf1.tar.gz
Exploding layer: sha256:c675caf21052d23973c95d259dbac97430d796da3fab2fc09802eb69ae48486c.tar.gz
Exploding layer: sha256:0c972402cfac710d09baba50faace437a49e067d834ff71aba43b20c825c3428.tar.gz
Exploding layer: sha256:e90d10a215c5bef2f758bcad29fd62d4c3aa63c9390b98c7ca43595ffc0bef1c.tar.gz
Exploding layer: sha256:c26de10464117fe619c0d37e215d13f73329c23274248911bf2926b1e75f0880.tar.gz
Exploding layer: sha256:44cac1c509e7f6fe4352f8aa3e45e7b8c5c0888c5f14821d433380e84b0e42a6.tar.gz
Exploding layer: sha256:ac83b0b490e58aecd2a2605f0551cd1b52d9e894ad44f7c0d69511703f99ecae.tar.gz
Exploding layer: sha256:0922aaa6836662bd082286e2a09e9e7eea13f8f56b77a5593e748109ad291b03.tar.gz
Exploding layer: sha256:70d872d8f6465e89f63f7ed47604f2b83a33a1c22fc873709d3756b3c0b72e5c.tar.gz
Exploding layer: sha256:d630f1c18b4996fbe11e0df6981e75b0e659ddcd095a059b0deff70fbad54d51.tar.gz
Exploding layer: sha256:2960bc1f6572ea3c80c3f67dff1cfa917619571d95f5bec2843d5b3076adf967.tar.gz
Exploding layer: sha256:226437feb6f94b5e2e4f2f4b4b7a1c445b6d3f06ead88f2dd0c5f62640ecf7f9.tar.gz
Exploding layer: sha256:0dc4a9118e98c8a685a71f616fc21d2fac1b0ed3425bc87c6c87949903b45d93.tar.gz
Exploding layer: sha256:18b5faa6fdbc67e5090848e782cc2fb65b39a01b34d1ad080ac4f4af8b506e22.tar.gz
Exploding layer: sha256:f578d6a95786009d26fc050c4d46af5592023f6f11a12cd5a30e74f088b46255.tar.gz
Exploding layer: sha256:155353c8b2c0d1a6182763bd818f06c42513f52af0f437097f6e20e05acf5497.tar.gz
Exploding layer: sha256:978aea45006018bc20f94cd5404a9f3349b9c7a2c28157f6ccaec1cf1080d32e.tar.gz
Exploding layer: sha256:ef7844c5f02d1915239aa4abac3ed3728d81c2c231a6a1a3f4a8f544dc7562a5.tar.gz
[WARN  tini (103249)] Tini is not running as PID 1 and isn't registered as a child subreaper.
Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
INFO:root:running on sequence ID 'hg19_wgEncodeGencodeBasicV19_ENST00000413440.1_1 range=chr1:14104913-14109326 5'pad=0 3'pad=0 strand=+ repeatMasking=none'
ERROR:root:no match found for sequence /net/rcstorenfs02/ifs/rc_labs/liau_lab/ceejay/crispor_input.fasta in genome hg19
INFO:root:Progress oAybu0NdRWqXk0IOTZ2b - effScores - Calculating guide efficiency scores
INFO:root:Wrote eff scores to /dev/shm/crisporG5dXIS/oAybu0NdRWqXk0IOTZ2b.effScores.tab
INFO:root:Progress oAybu0NdRWqXk0IOTZ2b - bwa - Alignment of potential guides, mismatches <= 4
[bwa_aln] fail to locate the index
ERROR:root:Error: could not run command set -o pipefail; /crisporWebsite/bin/Linux/bwa aln -o 0 -m 1980000 -n 4 -k 4 -N -l 20 /crisporWebsite/genomes/hg19/hg19.fa /dev/shm/crisporG5dXIS/oAybu0NdRWqXk0IOTZ2b.fa > /dev/shm/crisporG5dXIS/oAybu0NdRWqXk0IOTZ2b.sa.

I have also been running an adapted version of the code on the Nat Prot paper:

singularity run \
-B /net/rcstorenfs02/ifs/rc_labs/liau_lab/ceejay:/DATA \
-B /net/rcstorenfs02/ifs/rc_labs/liau_lab/crispor_genomes:/crisporWebsite/genomes \
-W /DATA \docker://pinellolab/crispor_crispresso_nat_prot \
crispor.py hg19 crispor_input.fasta crispor_output.tsv --satMutDir=./

and it gives the same errors.

Do you have any suggestions why this might be happening? Thank you!

output of guide cfd score

the current version of crispor seems to calculates the guide cfd score but doesn't print it to the output. Instead only the MIT specificity score and number of off-targets is printed. Would you be able to indicate the portion of the code that should be modified to include the CFD score among the outputs? Thank you!

Local Instalation

Hi,
I'm trying to install the local version of CRISPOR, but getting the following error. Can you help me?

python crispor.py sacCer3 sampleFiles/in/sample.sacCer3.fa sampleFiles/mine/sample.sacCer3.tsv -o sampleFiles/mine/sample.sacCer3.mine.offs.tsv
INFO:root: * running on sequence 'testSeq', guideLen=20, seqLen=182
INFO:root:Progress x50sPGMoTvUagv3zWjGg - bwasw - Searching genome for one 100% identical match to input sequence
[bsw2_aln] read 1 sequences/pairs (182 bp) ...
[main] Version: 0.7.9a-r786
[main] CMD: /Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/bin/Darwin/bwa bwasw -b 100 -q 100 -T 20 /Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/genomes/sacCer3/sacCer3.fa /var/tmp/primer3In6eca1vkx.txt
[main] Real time: 0.007 sec; CPU: 0.010 sec
INFO:root:Progress x50sPGMoTvUagv3zWjGg - effScores - Calculating guide efficiency scores
/opt/homebrew/Caskroom/miniconda/base/envs/crispor/lib/python3.9/site-packages/sklearn/base.py:347: InconsistentVersionWarning: Trying to unpickle estimator DummyRegressor from version 1.1.1 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(
Traceback (most recent call last):
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/crispor.py", line 8824, in
main()
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/crispor.py", line 8821, in main
mainCommandLine()
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/crispor.py", line 8621, in mainCommandLine
getOfftargets(seq, org, pamPat, batchId, startDict, ConsQueue())
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/crispor.py", line 4643, in getOfftargets
processSubmission(faFname, org, pamDesc, otBedFname, batchBase, batchId, queue)
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/crispor.py", line 4165, in processSubmission
createBatchEffScoreTable(batchId, queue)
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/crispor.py", line 3767, in createBatchEffScoreTable
guideRows = calcSaveEffScores(batchId, seq, extSeq, pam, queue)
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/crispor.py", line 3704, in calcSaveEffScores
effScores = crisporEffScores.calcAllScores(longSeqs, enzyme=enz, scoreNames=scoreNames)
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/crisporEffScores.py", line 922, in calcAllScores
scores["fusi"] = calcAziScore(trimSeqs(seqs, -24, 6))
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/crisporEffScores.py", line 1167, in calcAziScore
score = azimuth.model_comparison.predict(numpy.array([seq]), None, None, pam_audit=False)[0]
File "/Users/fisher21/Desktop/CRISPR_Physalia_workshop/test/crisporWebsite/bin/Azimuth-2.0/azimuth/model_comparison.py", line 544, in predict
model, learn_options = pickle.load(f)
File "sklearn/tree/_tree.pyx", line 714, in sklearn.tree._tree.Tree.setstate
File "sklearn/tree/_tree.pyx", line 1418, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:

  • expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
  • got : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

Error using rat rn6 genome on local installation

I downloaded the rat rn6 genome provided on the CRISPOR website, and tried to use it to find off-target locations for gRNAs using a .bed file as an input. This has previously worked for me very well using the hg38anset and mm10 genomes.

However, with the rn6 genome, I get an error:

python crisporWebsite/crispor.py rn6 Input_for_CRISPOR/input.bed Output_from_CRISPOR/output.tsv -o Output_from_CRISPOR/output_offs.tsv
Traceback (most recent call last):
File "crisporWebsite/crispor.py", line 8482, in
main()
File "crisporWebsite/crispor.py", line 8479, in main
mainCommandLine()
File "crisporWebsite/crispor.py", line 8241, in mainCommandLine
seqList = getGenomeSeqs(org, regions)
File "crisporWebsite/crispor.py", line 6979, in getGenomeSeqs
seq = tbf[chrom][start:end]
File "/usr/local/lib/python2.7/dist-packages/twobitreader/init.py", line 432, in getitem
return self.get_slice(min_=slice_or_key.start, max_=slice_or_key.stop)
File "/usr/local/lib/python2.7/dist-packages/twobitreader/init.py", line 517, in get_slice
more_bytes=morebytes)
File "/usr/local/lib/python2.7/dist-packages/twobitreader/init.py", line 158, in longs_to_char_array
raise ValueError('array_size must be at least 0')
ValueError: array_size must be at least 0

I haven't been able to figure out what might cause this error... Any help would be much appreciated!

crispor.py command fails on the test dataset

I installed all software and python packages on the centos linux platform using miniconda3.

> uname --kernel-name --kernel-release --machine
Linux 3.10.0-1160.66.1.el7.x86_64 x86_64

> cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

> conda --version
conda 22.9.0

I tested the crispor.py by running the following command provided in the README file.

> ./crispor.py sacCer3 sampleFiles/mine/sample.sacCer3.fa sampleFiles/mine/sample.sacCer.tsv -o sampleFiles/mine/test_run.tsv

INFO:root: * running on sequence 'testSeq', guideLen=20, seqLen=182
INFO:root:Progress x50sPGMoTvUagv3zWjGg - bwasw - Searching genome for one 100% identical match to input sequence
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[bsw2_aln] read 1 sequences/pairs (182 bp) ...
[main] Version: 0.7.15-r1140
[main] CMD: /aryeelab/aayushraman/crisporWebsite/bin/Linux/bwa bwasw -b 100 -q 100 -T 20 /aryeelab/aayushraman/crisporWebsite/genomes/sacCer3/sacCer3.fa /tmp/primer3Invk1AD7.txt
[main] Real time: 0.029 sec; CPU: 0.027 sec
INFO:root:Progress x50sPGMoTvUagv3zWjGg - effScores - Calculating guide efficiency scores
Traceback (most recent call last):
  File "./crispor.py", line 8726, in <module>
    main()
  File "./crispor.py", line 8723, in main
    mainCommandLine()
  File "./crispor.py", line 8524, in mainCommandLine
    getOfftargets(seq, org, pamPat, batchId, startDict, ConsQueue())
  File "./crispor.py", line 4523, in getOfftargets
    processSubmission(faFname, org, pamDesc, otBedFname, batchBase, batchId, queue)
  File "./crispor.py", line 4045, in processSubmission
    createBatchEffScoreTable(batchId, queue)
  File "./crispor.py", line 3647, in createBatchEffScoreTable
    guideRows = calcSaveEffScores(batchId, seq, extSeq, pam, queue)
  File "./crispor.py", line 3584, in calcSaveEffScores
    effScores = crisporEffScores.calcAllScores(longSeqs, enzyme=enz, scoreNames=scoreNames)
  File "/aryeelab/aayushraman/crisporWebsite/crisporEffScores.py", line 927, in calcAllScores
    scores["fusi"] = calcAziScore(trimSeqs(seqs, -24, 6))
  File "/aryeelab/aayushraman/crisporWebsite/crisporEffScores.py", line 1149, in calcAziScore
    import azimuth.model_comparison
  File "/aryeelab/aayushraman/crisporWebsite/bin/Azimuth-2.0/azimuth/model_comparison.py", line 1, in <module>
    import azimuth.predict as pd
  File "/aryeelab/aayushraman/crisporWebsite/bin/Azimuth-2.0/azimuth/predict.py", line 3, in <module>
    from sklearn.metrics import roc_curve, auc
  File "/aryeelab/aayushraman/bin/virtual_env/crispor/lib/python2.7/site-packages/sklearn/metrics/__init__.py", line 7, in <module>
    from .ranking import auc
  File "/aryeelab/aayushraman/bin/virtual_env/crispor/lib/python2.7/site-packages/sklearn/metrics/ranking.py", line 26, in <module>
    from ..utils import check_consistent_length
  File "/aryeelab/aayushraman/bin/virtual_env/crispor/lib/python2.7/site-packages/sklearn/utils/__init__.py", line 10, in <module>
    from .murmurhash import murmurhash3_32
  File "numpy.pxd", line 861, in init sklearn.utils.murmurhash (sklearn/utils/murmurhash.c:5033)

ValueError: numpy.ufunc has the wrong size, try recompiling

I am getting the numpy.ufunc error. Can you please help? Thanks!

Best,
-Ar

dbm problems

From Jubina Benny:

When I try running the example command:
python crispor.py sacCer3 sampleFiles/in/sample.sacCer3.fa sampleFiles/mine/sample.sacCer3.tsv -o sampleFiles/mine/sample.sacCer3.mine.offs.tsv

I get this error:

Traceback (most recent call last):
File "crispor.py", line 8458, in
main()
File "crispor.py", line 8455, in main
mainCommandLine()
File "crispor.py", line 8264, in mainCommandLine
getOfftargets(seq, org, pamPat, batchId, startDict, ConsQueue())
File "crispor.py", line 4377, in getOfftargets
processSubmission(faFname, org, pamDesc, otBedFname, batchBase, batchId, queue)
File "crispor.py", line 3912, in processSubmission
createBatchEffScoreTable(batchId, queue)
File "crispor.py", line 3515, in createBatchEffScoreTable
guideRows = calcSaveEffScores(batchId, seq, extSeq, pam, queue)
File "crispor.py", line 3457, in calcSaveEffScores
saveOutcomeData(batchId, mutScores)
File "crispor.py", line 4254, in saveOutcomeData
import dbm
ImportError: No module named dbm

crisprAddGenome bug

From Jubina Benny:

INFO:root:converting genePred to segment names: cat /tmp/MusaPhahang/MusaPhahang.gp | awk 'BEGIN {FS=OFS=" ";} // { gsub(" " , "", $1); gsub(" ", "", $12); print }' | genePredToBed stdin stdout | python bedToExons stdin stdout | sort -u | bedSort stdin stdout | python bedOverlapMerge /dev/stdin /dev/stdout | python bedBetween stdin /dev/stdout -a -s /tmp/MusaPhahang/MusaPhahang.sizes | bedSort stdin /tmp/MusaPhahang/MusaPhahang.segments.bed
INFO:root:Running: ['/bin/bash', '-e', '-o', 'pipefail', '-c', 'cat /tmp/MusaPhahang/MusaPhahang.gp | awk 'BEGIN {FS=OFS="\t";} // { gsub(" " , "", $1); gsub(" ", "", $12); print }' | genePredToBed stdin stdout | python bedToExons stdin stdout | sort -u | bedSort stdin stdout | python bedOverlapMerge /dev/stdin /dev/stdout | python bedBetween stdin /dev/stdout -a -s /tmp/MusaPhahang/MusaPhahang.sizes | bedSort stdin /tmp/MusaPhahang/MusaPhahang.segments.bed']
File "bedToExons", line 1
SyntaxError: Non-ASCII character '\x88' in file bedToExons on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Traceback (most recent call last):
File "bedOverlapMerge", line 67, in
printLine(lastChrom, lastStart, lastEnd, names)
File "bedOverlapMerge", line 28, in printLine
print "\t".join(row)
TypeError: sequence item 0: expected string, NoneType found
Reading beds ...
Reading stdin...
0 features read
Traceback (most recent call last):
File "bedBetween", line 313, in
output(outf, None, beds[0], upstream, limit, "ig", chromSize)
IndexError: list index out of range
Could not run command cat /tmp/MusaPhahang/MusaPhahang.gp | awk 'BEGIN {FS=OFS=" ";} // { gsub(" " , "", $1); gsub(" ", "", $12); print }' | genePredToBed stdin stdout | python bedToExons stdin stdout | sort -u | bedSort stdin stdout | python bedOverlapMerge /dev/stdin /dev/stdout | python bedBetween stdin /dev/stdout -a -s /tmp/MusaPhahang/MusaPhahang.sizes | bedSort stdin /tmp/MusaPhahang/MusaPhahang.segments.bed

Missing gRNA warnings with command-line CRISPOR

The CRISPOR website provides warnings of potential problems with gRNA sequences, such as presence of TTTT that terminates transcription from U6 promoters, but I do not see these warnings in the outputs of the command-line version. Is this feature built-in to the command-line version or should the user do his own checks for these problems? Thanks!

Errors running local installation

Hi there, I'm posting to ask for help with errors encountered trying to run CRISPOR locally. Please let me know if you have any ideas how to troubleshoot this. Any help would be greatly appreciated!

I do not have root/sudo privileges (running this on a server with python3 default) so installed CRISPOR using a virtual environment as follows:

$git clone https://github.com/maximilianh/crisporWebsite
$virtualenv --python=python2.7 crisporWebsite
$cd crisporWebsite
$source activate
(crisporWebsite)$pip install biopython numpy==1.14.0 scikit-learn==0.16.1 pandas twobitreader matplotlib scipy pytabix xlwt
(crisporWebsite)$ mv genomes.sample/ genomes/

All of which seemed to work without issue. I then tried the test and got:

(crisporWebsite)$ mkdir -p sampleFiles/mine/
(crisporWebsite)$ python crispor.py sacCer3 sampleFiles/in/sample.sacCer3.fa sampleFiles/mine/sample.sacCer3.tsv -o sampleFiles/mine/sample.sacCer3.mine.offs.tsv

INFO:root: * running on sequence 'testSeq', guideLen=20, seqLen=182
Traceback (most recent call last):
File "crispor.py", line 8003, in
main()
File "crispor.py", line 8001, in main
mainCommandLine()
File "crispor.py", line 7807, in mainCommandLine
batchId, position, extSeq = newBatch(seqId, seq, org, pamPat, skipAlign)
File "crispor.py", line 4054, in newBatch
chrom, start, end, strand = findBestMatch(org, seq, batchId)
File "crispor.py", line 6495, in findBestMatch
remoteAddr = pipes.quote(os.environ["REMOTE_ADDR"])
File "/lab/solexa_sabatini/genya/crispor/lib/python2.7/UserDict.py", line 23, in getitem
raise KeyError(key)
KeyError: 'REMOTE_ADDR'

Turning on the test and debugging options produced:

(crisporWebsite)$ python crispor.py -td sacCer3 sampleFiles/in/sample.sacCer3.fa sampleFiles/mine/sample.sacCer3.tsv -o sampleFiles/mine/sample.sacCer3.mine.offs.tsv


File "crispor.py", line 1908, in main.calcCfdScore
Failed example:
calcCfdScore("GGGGGGGGGGGGGGGGGGGGGGG", "aaaaGaGaGGGGGGGGGGGGGGG")
Expected:
0.5140384614450001
# mismatches: * !!
Got:
0.5140384614450001


File "crispor.py", line 1918, in main.calcCfdScore
Failed example:
calcCfdScore("ATGTGGAGATTGCCACCTACCGG", "ATCTGGAGATTGCCACCTACAGG")
Expected nothing
Got:
0.384615385


File "crispor.py", line 952, in main.findPams
Failed example:
findPams("TTTNCCCCCCCCCCCCCCCCCTTTN", "TTTN", "+", {}, set())
Expected:
({0: '+'}, set([3]))
Got:
({21: '+'}, set([25]))


File "crispor.py", line 956, in main.findPams
Failed example:
findPams("AAACCCCCCCCCCCCCCCCCCCCC", "NAA", "-", {}, set())
Expected:
({}, set([]))
Got:
({0: '-'}, set([3]))


2 items had failures:
2 of 6 in main.calcCfdScore
2 of 9 in main.findPams
Test Failed 4 failures.

Test ./crispor.py command fails.

This is done on an Ubuntu docker image and after installing the necessary Python dependencies.

I first run the test command:

# ./crispor.py sacCer3 sampleFiles/in/sample.sacCer3.fa sampleFiles/mine/sample.sacCer3.tsv -o sample
Files/mine/sample.sacCer3.mine.offs.tsv
INFO:root: * running on sequence 'testSeq', guideLen=20, seqLen=182
INFO:root:Progress x50sPGMoTvUagv3zWjGg - bwasw - Searching genome for one 100% identical match to input sequence
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[bsw2_aln] read 1 sequences/pairs (182 bp) ...
[main] Version: 0.7.15-r1140
[main] CMD: /var/www/html/bin/Linux/bwa bwasw -T 20 /var/www/html/genomes/sacCer3/sacCer3.fa /tmp/crisporBestMatchaVjbG4.fa
[main] Real time: 0.538 sec; CPU: 0.044 sec
INFO:root:Progress x50sPGMoTvUagv3zWjGg - effScores - Calculating guide efficiency scores
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa
Traceback (most recent call last):
  File "./crispor.py", line 8293, in <module>
    main()
  File "./crispor.py", line 8291, in main
    mainCommandLine()
  File "./crispor.py", line 8100, in mainCommandLine
    getOfftargets(seq, org, pamPat, batchId, startDict, ConsQueue())
  File "./crispor.py", line 4295, in getOfftargets
    processSubmission(faFname, org, pamDesc, otBedFname, batchBase, batchId, queue)
  File "./crispor.py", line 3835, in processSubmission
    createBatchEffScoreTable(batchId, queue)
  File "./crispor.py", line 3454, in createBatchEffScoreTable
    guideRows = calcSaveEffScores(batchId, seq, extSeq, pam, queue)
  File "./crispor.py", line 3392, in calcSaveEffScores
    effScores = crisporEffScores.calcAllScores(longSeqs, enzyme=enz, scoreNames=scoreNames)
  File "/var/www/html/crisporEffScores.py", line 885, in calcAllScores
    scores["fusi"] = calcAziScore(trimSeqs(seqs, -24, 6))
  File "/var/www/html/crisporEffScores.py", line 1107, in calcAziScore
    import azimuth.model_comparison
  File "/var/www/html/bin/Azimuth-2.0/azimuth/model_comparison.py", line 1, in <module>
    import azimuth.predict as pd
  File "/var/www/html/bin/Azimuth-2.0/azimuth/predict.py", line 3, in <module>
    from sklearn.metrics import roc_curve, auc
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/__init__.py", line 7, in <module>
    from .ranking import auc
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/ranking.py", line 26, in <module>
    from ..utils import check_consistent_length
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/__init__.py", line 10, in <module>
    from .murmurhash import murmurhash3_32
ImportError: numpy.core.multiarray failed to import

Upgrading Numpy:

# pip install -U numpy
Collecting numpy
  Downloading https://files.pythonhosted.org/packages/1f/c7/198496417c9c2f6226616cff7dedf2115a4f4d0276613bab842ec8ac1e23/numpy-1.16.4-cp27-cp27mu-manylinux1_x86_64.whl (17.0MB)
    100% |################################| 17.0MB 79kB/s
Installing collected packages: numpy
  Found existing installation: numpy 1.11.2
    Uninstalling numpy-1.11.2:
      Successfully uninstalled numpy-1.11.2
Successfully installed numpy-1.16.4

Running the test command again:

# ./crispor.py sacCer3 sampleFiles/in/sample.sacCer3.fa sampleFiles/mine/sample.sacCer3.tsv -o sampleFiles/mine/sample.sacCer3.mine.offs.tsv
INFO:root: * running on sequence 'testSeq', guideLen=20, seqLen=182
INFO:root:Progress x50sPGMoTvUagv3zWjGg - bwasw - Searching genome for one 100% identical match to input sequence
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[bsw2_aln] read 1 sequences/pairs (182 bp) ...
[main] Version: 0.7.15-r1140
[main] CMD: /var/www/html/bin/Linux/bwa bwasw -T 20 /var/www/html/genomes/sacCer3/sacCer3.fa /tmp/crisporBestMatch01h687.fa
[main] Real time: 0.026 sec; CPU: 0.015 sec
INFO:root:Progress x50sPGMoTvUagv3zWjGg - effScores - Calculating guide efficiency scores
/usr/local/lib/python2.7/dist-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
Traceback (most recent call last):
  File "./crispor.py", line 8293, in <module>
    main()
  File "./crispor.py", line 8291, in main
    mainCommandLine()
  File "./crispor.py", line 8100, in mainCommandLine
    getOfftargets(seq, org, pamPat, batchId, startDict, ConsQueue())
  File "./crispor.py", line 4295, in getOfftargets
    processSubmission(faFname, org, pamDesc, otBedFname, batchBase, batchId, queue)
  File "./crispor.py", line 3835, in processSubmission
    createBatchEffScoreTable(batchId, queue)
  File "./crispor.py", line 3454, in createBatchEffScoreTable
    guideRows = calcSaveEffScores(batchId, seq, extSeq, pam, queue)
  File "./crispor.py", line 3392, in calcSaveEffScores
    effScores = crisporEffScores.calcAllScores(longSeqs, enzyme=enz, scoreNames=scoreNames)
  File "/var/www/html/crisporEffScores.py", line 885, in calcAllScores
    scores["fusi"] = calcAziScore(trimSeqs(seqs, -24, 6))
  File "/var/www/html/crisporEffScores.py", line 1117, in calcAziScore
    score = azimuth.model_comparison.predict(numpy.array([seq]), None, None, pam_audit=False)
  File "/var/www/html/bin/Azimuth-2.0/azimuth/model_comparison.py", line 559, in predict
    feature_sets = feat.featurize_data(Xdf, learn_options, pandas.DataFrame(), gene_position, pam_audit=pam_audit, length_audit=length_audit)
  File "/var/www/html/bin/Azimuth-2.0/azimuth/features/featurization.py", line 31, in featurize_data
    get_all_order_nuc_features(data['30mer'], feature_sets, learn_options, learn_options["order"], max_index_to_use=30, quiet=quiet)
  File "/var/www/html/bin/Azimuth-2.0/azimuth/features/featurization.py", line 153, in get_all_order_nuc_features
    include_pos_independent=True, max_index_to_use=max_index_to_use, prefix=prefix)
  File "/var/www/html/bin/Azimuth-2.0/azimuth/features/featurization.py", line 425, in apply_nucleotide_features
    assert not np.any(np.isnan(feat_pd)), "nans here can arise from sequences of different lengths"
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 917, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Errors while running crispor.py script (AttributeError: 'NoneType' object has no attribute 'startswith')

I am getting following error at end of the job
Traceback (most recent call last):
File "crispor.py", line 7159, in
main()
File "crispor.py", line 7157, in main
mainCommandLine()
File "crispor.py", line 7021, in mainCommandLine
mergeGuideInfo(seq, startDict, pamPat, otMatches, position, effScores)
File "crispor.py", line 1856, in mergeGuideInfo
makePosList(org, pamMatches, guideSeqFull, pamPat, inputPos)
File "crispor.py", line 1414, in makePosList
alnHtml, hasLast12Mm = makeAlnStr(org, guideSeq, otSeq, pam, mitScore, cfdScore, posStr, dist)
File "crispor.py", line 1298, in makeAlnStr
if org.startswith("mm") or org.startswith("hg") or org.startswith("rn"):
AttributeError: 'NoneType' object has no attribute 'startswith'

Although, with sample genome file everything is running smoothly.

File "../bin/samToBed", line 14, in <module> IndexError: list index out of range

I am receiving the following area and can not figure out the problem. I'm running the cold on the test data.

../crispor.cgi sacCer3 sampleIn.fa sampleOut.mine.tsv -o sampleOutOfftargets.mine.tsv
INFO:root:running on sequence ID 'testSeq'
[bsw2_aln] read 1 sequences/pairs (182 bp) ...
[main] Version: 0.7.9a-r786
[main] CMD: /Users/elowe3/bioinformatics_software/crisporWebsite/bin/Darwin/bwa bwasw -T 20 /Users/elowe3/bioinformatics_software/crisporWebsite/genomes/sacCer3/sacCer3.fa /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crisporBestMatchT56QC_.fa
[main] Real time: 0.022 sec; CPU: 0.020 sec
INFO:root:Progress x50sPGMoTvUagv3zWjGg - effScores - Calculating guide efficiency scores
INFO:root:Wrote eff scores to /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crispormG6saZ/x50sPGMoTvUagv3zWjGg.effScores.tab
INFO:root:Progress x50sPGMoTvUagv3zWjGg - bwa - Alignment of potential guides, mismatches <= 4
[bwa_aln_core] calculate SA coordinate... 0.08 sec
[bwa_aln_core] write to the disk... 0.00 sec
[bwa_aln_core] 7 sequences have been processed.
[main] Version: 0.7.9a-r786
[main] CMD: /Users/elowe3/bioinformatics_software/crisporWebsite/bin/Darwin/bwa aln -o 0 -m 1980000 -n 4 -k 4 -N -l 20 ../genomes/sacCer3/sacCer3.fa /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crispormG6saZ/x50sPGMoTvUagv3zWjGg.fa
[main] Real time: 0.122 sec; CPU: 0.116 sec
INFO:root:Progress x50sPGMoTvUagv3zWjGg - saiToBed - Converting alignments
[bwa_aln_core] convert to sequence coordinate... 0.01 sec
[bwa_aln_core] refine gapped alignments... 0.00 sec
[bwa_aln_core] print alignments... 0.00 sec
[bwa_aln_core] 7 sequences have been processed.
[main] Version: 0.7.9a-r786
[main] CMD: /Users/elowe3/bioinformatics_software/crisporWebsite/bin/Darwin/bwa samse -n 60000 ../genomes/sacCer3/sacCer3.fa /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crispormG6saZ/x50sPGMoTvUagv3zWjGg.sa /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crispormG6saZ/x50sPGMoTvUagv3zWjGg.fa
[main] Real time: 0.047 sec; CPU: 0.046 sec
Traceback (most recent call last):
  File "../bin/samToBed", line 14, in <module>
    guideLen = int(sys.argv[2])
IndexError: list index out of range
ERROR:root:Error: could not run command set -o pipefail; /Users/elowe3/bioinformatics_software/crisporWebsite/bin/Darwin/bwa samse -n 60000 ../genomes/sacCer3/sacCer3.fa /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crispormG6saZ/x50sPGMoTvUagv3zWjGg.sa /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crispormG6saZ/x50sPGMoTvUagv3zWjGg.fa | ../bin/xa2multi.pl | ../bin/samToBed NGG | sort -k1,1 -k2,2n | /Users/elowe3/bioinformatics_software/crisporWebsite/bin/Darwin/bedClip stdin ../genomes/sacCer3/sacCer3.sizes stdout >> /var/folders/bz/lbll3mbn29v8bpf5w2hfdfyngydffc/T/crispormG6saZ/x50sPGMoTvUagv3zWjGg.matches.bed .

certain genomic target sequences not found in genome and crash lindel calculation

Found that rarely certain genomic ranges crash the command line version (error output pasted at bottom); while on the website, inputting these sequences produces "Query sequence, not found in the selected genome, Homo sapiens (hg38)" output, even though inputting the genomic coordinates pulls up the target sequence.

>ENSG00000286185 AC242842.3 exon ENST00000621744.4_5 range=chr1:149482155-149482246 strand=+
ttctctgaatttatttacagAAAATGAAAGTGATGATGAGGAAGAGGAAGAAAAAGGGCCAGTGTCTCCCAGgtaatgttgtggaattgttg
>ENSG00000261832 AC138894.1 exon ENST00000637378.1_8 range=chr16:28458366-28458441 strand=-
tcatgtgttggctttttcagATCCCCCCTTCTGCAAGAAAGCCTCTTTGCAACTGGgtaagtttgtttgttttcct
>ENSG00000278662 GOLGA6L10 exon ENST00000610657.1_3 range=chr15:82346541-82346661 strand=-
tctctctgcatgcacctcagAGCCAGTACCAAGAACTAGCAGTGGCCCTGGATTCAAGCTCCGCAATAATCAGTCAACTCACTGAAAACATCAATTCACTGgtaagagtccagtggggtcc
>ENSG00000261247 GOLGA8T exon ENST00000569052.1_5 range=chr15:30139339-30139426 strand=+
ctgtcttcctcttcctacagGAAAAGAAAGCAAACAACAAGAAACAGAAAGCCAAAAGGGTGCTAGAGgtgagtggagggtgtgcagt
>ENSG00000204172 AGAP9 exon ENST00000452145.6_1 range=chr10:47522846-47522954 strand=-
ttctccctctatacatatagCTTTGGAGTTTAACCTTTCTGCCAATCCAGAGGCAAGCACAATATTCCAGAGGAACTCTCAAACAGATGgtgagacaacagtgtctgta
>ENSG00000125498 KIR2DL1 exon ENST00000336077.11_0 range=chr19:54769831-54769904 strand=+
ctgtctgctccggcagcaccATGTCGCTCTTGGTCGTCAGCATGGCGTGTGTTGgtgagtcctggaaagcaata
>ENSG00000240038 AMY2B exon ENST00000361355.8_0,ENST00000610648.1_0 range=chr1:103571583-103571790 strand=+
actgacaacttcaaagcaaaATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAGGACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTTAGCTCCCAAGGGATTTGGAGGGGTTCAGgtgggtatgattcatagtat
>ENSG00000263956 NBPF11 exon ENST00000615281.4_10 range=chr1:148114417-148114508 strand=-
ttctctgaatttatttacagAAAATGAAAGTGATGATGAGGAAGAGGAAGAAAAAGGGCCAGTGTCTCCCAGgtaatgttgtggaattgttg
>ENSG00000269713 NBPF9 exon ENST00000615421.4_13,ENST00000584027.8_13 range=chr1:149063613-149063825 strand=-
actttttcccacttttccagGCTCAGCAGGGAGCTGCTGGATGAGAAAGGGCCTGAAGTCTTGCAGGACTCACTGGATAGATGTTATTCAACTCCTTCAGGTTGTCTTGAACTGACTGACTCATGCCAGCCCTACAGAAGTGCCTTTTACGTATTGGAGCAACAGCGTGTTGGCTTGGCTGTTGACATGGATGgtgagtacctttctatgaag
>ENSG00000260691 ANKRD20A1 exon ENST00000562196.5_9 range=chr9:67887256-67887324 strand=+
atatcccctttgctttgtagGGCCTCCTGCAAAACATCCTTCCTTGAAGgtaattaattatgtatattt

INFO:root: * running on sequence 'ENSG00000286185 AC242842.3 exon ENST00000621744.4_5 range=chr1:149482155-149482246 strand=+', guideLen=20, seqLen=92
INFO:root:Progress sTFqCkJ9RFySD1swZXbw - bwasw - Searching genome for one 100% identical match to input sequence
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[bsw2_aln] read 1 sequences/pairs (92 bp) ...
[main] Version: 0.7.15-r1140
[main] CMD: /lab/solexa_sabatini/genya/crisporWebsite6/bin/Linux/bwa bwasw -T 20 /lab/solexa_sabatini/genya/crisporWebsite6/genomes/hg38/hg38.fa /tmp/crisporBestMatchttMxuE.fa
[main] Real time: 5.965 sec; CPU: 5.857 sec
INFO:root:Progress sTFqCkJ9RFySD1swZXbw - effScores - Calculating guide efficiency scores
INFO:root:Progress sTFqCkJ9RFySD1swZXbw - outcome - Calculating editing outcomes
Traceback (most recent call last):
File "crispor.py", line 8304, in
main()
File "crispor.py", line 8301, in main
mainCommandLine()
File "crispor.py", line 8110, in mainCommandLine
getOfftargets(seq, org, pamPat, batchId, startDict, ConsQueue())
File "crispor.py", line 4300, in getOfftargets
processSubmission(faFname, org, pamDesc, otBedFname, batchBase, batchId, queue)
File "crispor.py", line 3845, in processSubmission
createBatchEffScoreTable(batchId, queue)
File "crispor.py", line 3461, in createBatchEffScoreTable
guideRows = calcSaveEffScores(batchId, seq, extSeq, pam, queue)
File "crispor.py", line 3402, in calcSaveEffScores
mutScores = crisporEffScores.calcMutSeqs(pamIds, longSeqs, enz, scoreNames=mutScoreNames)
File "/lab/solexa_sabatini/genya/crisporWebsite6/crisporEffScores.py", line 1311, in calcMutSeqs
mutSeqDict = calcLindelScore(seqIds, seqs)
File "/lab/solexa_sabatini/genya/crisporWebsite6/crisporEffScores.py", line 748, in calcLindelScore
return runLindel(seqIds, trimSeqs(seqs, -33, 27))
File "/lab/solexa_sabatini/genya/crisporWebsite6/crisporEffScores.py", line 724, in runLindel
y_hat, fs = Lindel.Predictor.gen_prediction(seq,weights,prerequesites)
ValueError: too many values to unpack
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "crispor.py", line 7868, in delBatchDir
raise Exception("cowardly refusing to remove many temp files")
Exception: cowardly refusing to remove many temp files

KeyError: 'lindel'

Dear,
I get the follows stderr when running crisporWebsite for Cpf1 in the command line.
This is my command

python crispor.py genome A01G000010.1.fa /public/home/software/crisporWebsite/genome.mine.tab -o /public/home/software/crisporWebsite/genome.Offtargets.mine.tab -g  -p TTTN

The stderr messages:

INFO:root: * running on sequence 'A01G000010.1.exon1', guideLen=23, seqLen=1432
INFO:root:Progress sB5Xn3UxqFPZmXaCAlYP - bwasw - Searching genome for one 100% identical match to input sequence
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[bsw2_aln] read 1 sequences/pairs (1432 bp) ...
[main] Version: 0.7.15-r1140
[main] CMD: /public/home/software/crisporWebsite/bin/Linux/bwa bwasw -T 20 /public/home/CRISPR/genome.fa /tmp/crisporBestMatch2gfVeQ.fa
[main] Real time: 10.055 sec; CPU: 3.776 sec
INFO:root:Progress sB5Xn3UxqFPZmXaCAlYP - effScores - Calculating guide efficiency scores
Using Theano backend.
INFO:root:Progress sB5Xn3UxqFPZmXaCAlYP - outcome - Calculating editing outcomes
Traceback (most recent call last):
  File "crispor.py", line 8293, in <module>
    main()
  File "crispor.py", line 8291, in main
    mainCommandLine()
  File "crispor.py", line 8100, in mainCommandLine
    getOfftargets(seq, org, pamPat, batchId, startDict, ConsQueue())
  File "crispor.py", line 4295, in getOfftargets
    processSubmission(faFname, org, pamDesc, otBedFname, batchBase, batchId, queue)
  File "crispor.py", line 3835, in processSubmission
    createBatchEffScoreTable(batchId, queue)
  File "crispor.py", line 3454, in createBatchEffScoreTable
    guideRows = calcSaveEffScores(batchId, seq, extSeq, pam, queue)
  File "crispor.py", line 3401, in calcSaveEffScores
    effScores[mutScoreName] = extractMutScores(mutScores[mutScoreName], pamIds)
KeyError: 'lindel'

Any help is much appreciated.
Thanks.

Best regards,

Possibility to keep off-targets list of gRNA in repeats ?

Hi,

We are currently working on identifying gRNA sequences targeting sub-families of transposable elements. The purpose is to test "consensus" sequences that we identified to show that these sequences are globally shared among all the transposable elements from these sub-families.

So for different sub-families, it works really well. We create a query with the "consensus" sequence that we want to test, and among the off-targets, by comparing the localisations, we know for each off-target exactly which transposable elements or genes the off-target comes from.

However, for some sub-families with a higher number of transposable elements (more than 60 000), neither the website or the command line tools allows me to retrieve the tabular list of off-targets.

I tried to change some parameters in the crispor.py in order to not have a filter on Repeats but I still can't retrieve the off-targets data (whereas the gRNA is present in the guideRNA tabular file).

Do you have any idea how I could retrieve the off-targets from the gRNA with many off-targets ? (I also "played" with --maxOcc --minAltPamScore but still can't retrieve them.

Thank you for your time and thanks again for this tools.

Best,

Pierre-Emmanuel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.