Giter VIP home page Giter VIP logo

crisprdetect_2.2's Introduction

CRISPRDetect Version 2.2 help:

CRISPREDetect is a perl program developed and tested in Fedora 21 Linux operating system. CRISPRDetect.pl should run under any unix based operating system that has a working 'perl' executable [comes with default installations under all *nix based operating systems]. If all the 3rd party dependencies are installed and available in user/system $PATH CRISPRDetect should run without any issues.

The cd-hit-est program installation in Mac operating system were reported to have issues. If you face issues installing cd-hit-est, please refer to the link: weizhongli/cdhit#24

INSTALLATION:

Please make sure that the following 3rd party tools are installed in your system. The only CPAN perl package 'Parallel' is provided in the 'lib' folder and should work. However, if it doesn't then either execute "cpan Parallel::ForkManager" to install it or execute the following commands after making sure you have its dependencies (POSIX, Storable, File::Spec, File::Temp, File::Path 2.00 and Test::More 0.81_01) installed in your system

wget http://search.cpan.org/CPAN/authors/id/Y/YA/YANICK/Parallel-ForkManager-1.19.tar.gz
tar xvzf Parallel-ForkManager-1.19.tar.gz
cd Parallel-ForkManager-1.19
perl Makefile.PL && make test && make install

CRISPRDetect dependencies:

The following dependencies are needed by CRISPRDetect.

clustalw 	Download from ftp://ftp.ebi.ac.uk/pub/software/clustalw2/2.1/ 
water 		Comes with EMBOSS:6.3.1+ tools : 	Download from ftp://emboss.open-bio.org/pub/EMBOSS/EMBOSS-6.6.0.tar.gz
seqret 		Comes with EMBOSS:6.3.1+ tools : 	Download from ftp://emboss.open-bio.org/pub/EMBOSS/EMBOSS-6.6.0.tar.gz
RNAfold 	Comes with Vienna RNA package  :	Download the correct version specific for your operating system from http://www.tbi.univie.ac.at/RNA/#download
cd-hit-est 	Comes with cdhit package  	   :	Download from http://weizhongli-lab.org/cd-hit/download.php  	
blastn 		Comes with ncbi-blast+ package :	Download from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

CRISPRDetect requires clustalw2 to be available in the $PATH as clustalw. You can do that by "cp /PATH_TO_CLUSTALW2/clustalw2 /bin/clustalw" or by creating a symbolic link "ln -s /PATH_TO_CLUSTALW2/clustalw2 /bin/clustalw" where PATH_TO_CLUSTALW2 is the full path of the directory where the executable clustalw2 is present.

Once all the dependencies are installed, please check that they are successfully installed and available in the user/system PATH by typing the following:

clustalw -help
water -help
seqret -help
RNAfold -help
cd-hit-est -help
blastn -help

Basic syntax:

perl CRISPRDetect.pl -g NZ_CP006019.gbk -o NZ_CP006019_CRISPRDetect.txt > NC_003106_CRISPRDetect.log

The above command runs CRISPRDetect with default paramaeter on a complete gbk file (file containing both annotation and sequence) that has cas1 or cas2 annotated.

perl CRISPRDetect.pl -f test_multifasta.fa -o test_CRISPRDetect -check_direction 0 -array_quality_score_cutoff 3 -T 0 > test.log

The above command runs CRISPRDetect with a lower score cutoff on a fasta file, cutoff 3 rather than 4, as cas1 and cas2 are not annotated and would score +1. Appropriate for contigs/fasta. -T 0, use all processors rather than the default of 4. Does not check direction (not recommended) ]

Compulsory input parameters for CRISPRDetect:

-f/-g		FASTA/Genbank	Specify a FASTA formatted [e.g. -f test.fa] or Genbank formatted file [e.g. -g NZ_CP006019.gbk] containing the sequence. 		

Note: the default cutoff of 4 is appropriate for genbank files that have cas1 or cas2 annoated, 3 is more appropriate for fa.

-o		TEXT		Specify a text file that will contain the output [e.g. -o NC_003106_CRISPRDetect] 			
					Note: CRISPRDetect will provide two additional output files - one containing the filtered out arrays (e.g. NC_003106_CRISPRDetect.fp) 
					and a gff annotation file (e.g. NC_003106_CRISPRDetect.gff)

Basic options:

-h/-help	HELP		shows this help text
-q/-quiet	0 or 1		Switch off/on step by step progress reporting [default 0]	
-T			Threads		Specify number of parallel processes CRISPRDetect should use  [default 4; specify '-T 0' to use all processors]		
-tmp_dir	tmp/		This is the default directory where temporary files generated by CRISPRDetect and its dependencies will be stored.	

Parameters for putative CRISPR identification [optional]:

-word_length			11	This is the default word length CRISPRDetect uses to find the putative CRISPRs. Any positive integer >=6 can be used.
-minimum_word_repeatation	3	By default CRISPRDetect uses 3 repeating identical words to find putative CRISPRs. To find CRISPRs with 2 repeats, use -minimum_word_repeatation 2	
-max_gap_between_crisprs	125	By default the maximum gap is set tp 125 nucleotides between the repeating identical seed words.
-repeat_length_cutoff		17	After the intial processing, putative CRISPRs with repeat lengths less than this value will be rejected.

Filtering parameters [optional]:

-minimum_repeat_length		23	Minimum length of repeats 
-minimum_no_of_repeats		3	Predicted CRISPRs with number of repeats less than this value will be excluded. To include CRISPRs with only 2 repeats, use -minimum_no_of_repeats 2
-array_quality_score_cutoff	4	Predicted CRISPRs with score less than this value will be excluded from the output file. 
								The CRISPRs with score >=0 and less than the specied value will be moved to the output.fp file [output refers to user given output filename]. Cutoff of 3 is more appropriate for fasta files.

Additional parameters [optional]:

-left_flank_length		500	This is the default length of the 5' (upstream) region of the CRISPRs.
-right_flank_length		500	This is the default length of the 3' (downstream) region of the CRISPRs.		

Advanced options [optional]:

To test different methods as specified in the literature, open the CRISPRDetect.pl program with any text editor [e.g. gedit in RHEL/Fedora/CentOS, or vi in any *nix OS, or notepad in Windows OS] and change the parameters in the top most section of the script. To toggle individual methods, locate the '$check_' prefix and change the value to 1 (i.e. the method will be applied) or 0 (i.e. the method will not be applied).

Examples:
	
	Direction specific options:
	--------------------------
		$check_direction=0;			[ Default is 1, making it 0 will turn the method off.] 
			
	To change the parameter(s) of a particular method (such as check_array_degeneracy) change the nested variables under that particular method.
	
		$check_array_degeneracy=1;	 
			$array_degeneracy_score=0.41; 		[ Default: PPV (0.91) - 0.50 ]
			$permitted_mutation_per_array=0; 	[ Default 0 ]

Changing to '$permitted_mutation_per_array=2;' will instruct the program to allow maximum 2 bases as permitted mutations per CRISPR array.

NOTE:

The 'tmp' folder in the CRISPRDetect installation directory should have read and write permissions. An easy way to do that is by issuing the command 'chmod -R 755 . && chmod 777 tmp' from the CRISPRDetect installation directory. If you use CRISPRDetect, then please cite:

Biswas, A., Staals, R. H., Morales, S. E., Fineran, P. C. & Brown, C. M. CRISPRDetect: a flexible algorithm to define CRISPR arrays. BMC Genomics 17, 356 (2016).

For version updates and bug fixes refer to https://github.com/ambarishbiswas/CRISPRDetect_2.2 http://bioanalysis.otago.ac.nz/CRISPRDetect

crisprdetect_2.2's People

Stargazers

Haotian Zheng avatar Boï Kone avatar Elena Cabello Yeves avatar  avatar  avatar Sisi Huang avatar  avatar Oliver Schwengers avatar  avatar

Watchers

James Cloos avatar  avatar

crisprdetect_2.2's Issues

how to calculate the DR_count?

Dear Dr. Biswas

I hope this message finds you well. I am writing to express my admiration for your publication in Current Biology titled "High viral abundance and low diversity are associated with increased CRISPR-Cas prevalence across microbial ecosystems." Your work has provided valuable insights into the relationship between viral abundance, microbial diversity, and the prevalence of CRISPR-Cas systems.
Based on Dr. Sean's help, I have calculated the abundance of a sample.I utilized the DR database as the query sequence and the reads from the metagenomic sequencing of 'sample1' as the database. According to the command 'blastn -task blastn-short -query repeat.fasta -db sample_read -out blast.out.txt -perc_identity 100 -qcov_hsp_perc 100 -num_threads 10 -outfmt 6', I generated a matrix.
DR_id sample1. sample2
repeat1 0 5
repeat10061 4 2
repeat101 1 5
repeat10100 0 0
repeat10101 14 1
repeat10109 12 41
repeat10110 0 2
repeat10112 0 0
repeat10124 45 3
repeat10128 1 6
repeat10183 46 7
I inquired with Dr. Bean whether this matrix is the result of the BLAST output, because the data.csv he provided on GitHub is not in this form, but rather the total Dr_count for each sample. He suggested I contact you, hoping you can give me some advice, which is very important to me. Thank you very much.
best,
Dr Ge

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.