Fasta Utilities

A collection of scripts developed to interact with FASTA, FASTQ and SAM files. All the scripts use the ReadFastx module I wrote, which reads either a FASTA or FASTQ file by record. It also uses the FileBar module, which gives a terminal progress bar on a file as it is processed. ReadSam is the SAM equivalent.

Conversion

2big.pl - takes a bed, SAM, or wiggle file and creates a big version of it to upload to ucsc
fastq2fasta.pl - converts FASTQ to FASTA
sam2fastq.pl - Converts a SAM format to FASTQ format
mate_pair2paired_end.pl - converts mate pair reads to paired end orientation

Reformatting

fix_headers.pl - fixes the FASTQ header by removing spaces and optionally appending a suffix
remap_file.pl - takes a file with tab delimited mappings and substitutes each of the first terms for each of the second terms
standardize_names.pl - renames FASTA files from ncbi into uscs nomenclature chr##
unique_headers.pl - reads a FASTA file and ensures all of the names are unique
wrap.pl - limits FASTA lines to 80 characters

Modification

bisulfite_convert.pl - bisulfite converts the sequences given to it
merge_records.pl - merges all the input records into one record, by default uses the name of the first record, but can be changed with -name
reverse_complement.pl - takes sequences and reverse complements them
trim_fasta.pl - trims a fastx file to x bp
pairs_sorted.pl - takes two files of reads sorted by header, and outputs two files containing those reads which have pairs
pairs_unsorted.pl - gets the pairs of the files in the first file from the second file, pairs are matched by header name
regex_fasta.pl - applies the given regex to the FASTA headers or sequence
remove_ambiguous.pl - removes ambiguity codes from FASTA files
splice.pl - splice a FASTA file given a gff file
split_fasta.pl - splits a multi FASTA file into multiple files, can split in different ways
subset_fasta.pl - subsets a FASTA file
trans_fasta.pl - translate a FASTA cDNA to protein
generate_fasta.pl - create a random FASTA file
consensus.pl - generate a consensus FASTA file from a bam file

Filtering

filter_align.pl - filters alignments from a bam or SAM file
filter_reads.pl - filters aligned reads from a file by mapping with bowtie2
get_fasta.pl - selects FASTA records which match or don't match a pattern
in_list.pl - reads a list of headers, and a fastx file and outputs records which are in the list
size_select.pl - returns the sequences with lengths between the values specified by -low and -high
sort.pl - sorts a FASTA file using gnu sort, can sort by header, sequence, length ect.

Information

avg_coverage.pl - gets the average coverage per sequence from a bam file
lengths.pl - length of each record
calcN.pl - takes a file of FASTA lengths, or a FASTA or FASTQ file directly, and calculates the nX of the file, by default N50
CpG_count.pl - counts the number of CpGs in a FASTA file
distances.pl - get the within group bitscore distance of all the records in a FASTA file using blast
fasta_head.pl - emulates unix head for FASTA and FASTQ files
fasta_tail.pl - emulates unix tail for FASTA and FASTQ files
percent_GC.pl - calculates percent GC for each FASTA record in a file, as well as the total GC content
sam_lengths.pl - gets the sequence lengths from a SAM file
size.pl - gets the total size of a FASTA file and the number of sequences

Bed scripts

absolute_coordinates.pl - takes a file with the chromosome and location and a file of chromosome sizes, and converts the coordinates to an absolute scale for plotting
bed2igv.pl - converts a bed file to a igv snapshot script
combine_bed.pl - Combine bed files
gff2bed.pl - converts a gff file to a bed file

Miscellaneous

align_progress.pl - given the input filename and the output filename, figures out the last line using tail, then greps for that header in the input, and works out the percentage that way
blast_information.pl - gets sequence information from gi numbers from a blast results file
fetch_entrez.pl - Download a number of sequences from an entrez query
fetch_gi.pl - download FASTA files from NCBI and outputs a FASTA file
fetch_sra.pl - downloads the sra sequences from NCBI using aspera and outputs a FASTQ file
generate_map.pl - remaps FASTA sequences from the first file to FASTA sequences from the second file, matches by hashing the sequence
mpileup_counts.pl - parses a mpileup file and gets the base counts
rename_script.pl - Rename a file, changing any references to the old name in the file to the new name

Installation

Install with optional prefix, omit the prefix if you want to install system-wide.

perl Makefile.PL PREFIX=$HOME
make
make install

Compilation issue

Hello Jim,
I am trying the unique_headers.pl but unsuccessfully. I am using the WSL version for windows and after following the instructions for compilation, I still get the following error:
"
unique_headers.pl --help
Can't locate ReadFastx.pm in @inc (you may need to install the ReadFastx module) (@inc contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at /usr/local/bin/unique_headers.pl line 31.
BEGIN failed--compilation aborted at /usr/local/bin/unique_headers.pl line 31.
"
After cloning the repository and using the "perl Makefile.PL PREFIX=$HOME" I get the following errors:

Warning: prerequisite Class::XSAccessor 0 not found.
Warning: prerequisite Inline 0 not found.
Warning: prerequisite List::MoreUtils 0 not found.
Warning: prerequisite Moose 0 not found.
Warning: prerequisite MooseX::NonMoose 0 not found.
Warning: prerequisite Readonly 0 not found.
Warning: prerequisite Term::ProgressBar 0 not found.
Warning: prerequisite namespace::autoclean 0 not found.
Generating a Unix-style Makefile
Writing Makefile for fasta_utilities
Writing MYMETA.yml and MYMETA.json

After that, make and make install look fine.
"/usr/bin/perl" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/absolute_coordinates.pl
cp scripts/wrap.pl blib/script/wrap.pl
"/usr/bin/perl" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/wrap.pl
Manifying 40 pod documents
Manifying 22 pod documents

make install
"Appending installation info to /home/moraisd/lib/x86_64-linux-gnu/perl/5.22.1/perllocal.pod"

But I still get the error.

Could you show me how to solve it? Thanks.

jimhester / fasta_utilities Goto Github PK

fasta_utilities's Introduction

Fasta Utilities

Conversion

Reformatting

Modification

Filtering

Information

Bed scripts

Miscellaneous

Installation

fasta_utilities's People

Contributors

Stargazers

Watchers

Forkers

fasta_utilities's Issues

Recommend Projects

Recommend Topics

Recommend Org