Giter VIP home page Giter VIP logo

fasta_utilities's Introduction

Fasta Utilities

A collection of scripts developed to interact with FASTA, FASTQ and SAM files. All the scripts use the ReadFastx module I wrote, which reads either a FASTA or FASTQ file by record. It also uses the FileBar module, which gives a terminal progress bar on a file as it is processed. ReadSam is the SAM equivalent.

Conversion

  • 2big.pl - takes a bed, SAM, or wiggle file and creates a big version of it to upload to ucsc
  • fastq2fasta.pl - converts FASTQ to FASTA
  • sam2fastq.pl - Converts a SAM format to FASTQ format
  • mate_pair2paired_end.pl - converts mate pair reads to paired end orientation

Reformatting

  • fix_headers.pl - fixes the FASTQ header by removing spaces and optionally appending a suffix
  • remap_file.pl - takes a file with tab delimited mappings and substitutes each of the first terms for each of the second terms
  • standardize_names.pl - renames FASTA files from ncbi into uscs nomenclature chr##
  • unique_headers.pl - reads a FASTA file and ensures all of the names are unique
  • wrap.pl - limits FASTA lines to 80 characters

Modification

  • bisulfite_convert.pl - bisulfite converts the sequences given to it
  • merge_records.pl - merges all the input records into one record, by default uses the name of the first record, but can be changed with -name
  • reverse_complement.pl - takes sequences and reverse complements them
  • trim_fasta.pl - trims a fastx file to x bp
  • pairs_sorted.pl - takes two files of reads sorted by header, and outputs two files containing those reads which have pairs
  • pairs_unsorted.pl - gets the pairs of the files in the first file from the second file, pairs are matched by header name
  • regex_fasta.pl - applies the given regex to the FASTA headers or sequence
  • remove_ambiguous.pl - removes ambiguity codes from FASTA files
  • splice.pl - splice a FASTA file given a gff file
  • split_fasta.pl - splits a multi FASTA file into multiple files, can split in different ways
  • subset_fasta.pl - subsets a FASTA file
  • trans_fasta.pl - translate a FASTA cDNA to protein
  • generate_fasta.pl - create a random FASTA file
  • consensus.pl - generate a consensus FASTA file from a bam file

Filtering

  • filter_align.pl - filters alignments from a bam or SAM file
  • filter_reads.pl - filters aligned reads from a file by mapping with bowtie2
  • get_fasta.pl - selects FASTA records which match or don't match a pattern
  • in_list.pl - reads a list of headers, and a fastx file and outputs records which are in the list
  • size_select.pl - returns the sequences with lengths between the values specified by -low and -high
  • sort.pl - sorts a FASTA file using gnu sort, can sort by header, sequence, length ect.

Information

  • avg_coverage.pl - gets the average coverage per sequence from a bam file
  • lengths.pl - length of each record
  • calcN.pl - takes a file of FASTA lengths, or a FASTA or FASTQ file directly, and calculates the nX of the file, by default N50
  • CpG_count.pl - counts the number of CpGs in a FASTA file
  • distances.pl - get the within group bitscore distance of all the records in a FASTA file using blast
  • fasta_head.pl - emulates unix head for FASTA and FASTQ files
  • fasta_tail.pl - emulates unix tail for FASTA and FASTQ files
  • percent_GC.pl - calculates percent GC for each FASTA record in a file, as well as the total GC content
  • sam_lengths.pl - gets the sequence lengths from a SAM file
  • size.pl - gets the total size of a FASTA file and the number of sequences

Bed scripts

  • absolute_coordinates.pl - takes a file with the chromosome and location and a file of chromosome sizes, and converts the coordinates to an absolute scale for plotting
  • bed2igv.pl - converts a bed file to a igv snapshot script
  • combine_bed.pl - Combine bed files
  • gff2bed.pl - converts a gff file to a bed file

Miscellaneous

  • align_progress.pl - given the input filename and the output filename, figures out the last line using tail, then greps for that header in the input, and works out the percentage that way
  • blast_information.pl - gets sequence information from gi numbers from a blast results file
  • fetch_entrez.pl - Download a number of sequences from an entrez query
  • fetch_gi.pl - download FASTA files from NCBI and outputs a FASTA file
  • fetch_sra.pl - downloads the sra sequences from NCBI using aspera and outputs a FASTQ file
  • generate_map.pl - remaps FASTA sequences from the first file to FASTA sequences from the second file, matches by hashing the sequence
  • mpileup_counts.pl - parses a mpileup file and gets the base counts
  • rename_script.pl - Rename a file, changing any references to the old name in the file to the new name

Installation

Install with optional prefix, omit the prefix if you want to install system-wide.

perl Makefile.PL PREFIX=$HOME
make
make install

fasta_utilities's People

Contributors

jimhester avatar shenwei356 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fasta_utilities's Issues

problem with ReadFastx::Fastq::Seq using fasta2fastq

I try fasta2fastq.pl after installing your package without any problem (perl Makefile.PL/make/make install) but I obtain error massage
Can't locate object method "new" via package "ReadFastx::Fastq::Seq" (perhaps you forgot to load "ReadFastx::Fastq::Seq"?) at /usr/local/bin/fasta2fastq.pl line 60, <$_[...]> line 2.

consensus.pl cannot open file

Hi,

definitely have the correct file name and path, the always get this error:

home> ~/bin/fasta_utils/consensus.pl test_sorted.bam
Use of uninitialized value $filename in string at /OSM/HOME-MEL/all29c/bin/fasta_utils/consensus.pl line 101.
Use of uninitialized value $filename in concatenation (.) or string at home/bin/fasta_utils/consensus.pl line 101.
No such file or directory:Could not open

Thanks,

Theo

Error when running `split_fasta.pl`

Hi @jimhester , it failed to run split_fasta.pl

$ split_fasta.pl -amount 2 dataset_A.fa -force
dataset_A.fa:  98% [==================================*============================================  ]0m00s
 LeftNot an ARRAY reference at /usr/local/bin/split_fasta.pl line 118, <$_[...]> line 175365.
$ split_fasta.pl -amount 2 dataset_B.fa -force                                
dataset_B.fa:  56% [*=============================================                                   ]0m00s 
LeftNot an ARRAY reference at /usr/local/bin/split_fasta.pl line 118, <$_[...]> line 7.
$ split_fasta.pl -amount 2 dataset_RNA.fasta -force                           
Not an ARRAY reference at /usr/local/bin/split_fasta.pl line 118, <$_[...]> line 101.

I checked the code and printed the type of $seq_ref, it was REF, and there was error information:

Virtual timer expired

By the way, I'm writing a FASTA kit, named fakit, and I've done some benchmarks with similar tools, including fasta_utilities. Could you please take few minutes to look and give some advice?

sincerely,
Wei

Compilation issue

Hello Jim,
I am trying the unique_headers.pl but unsuccessfully. I am using the WSL version for windows and after following the instructions for compilation, I still get the following error:
"
unique_headers.pl --help
Can't locate ReadFastx.pm in @inc (you may need to install the ReadFastx module) (@inc contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at /usr/local/bin/unique_headers.pl line 31.
BEGIN failed--compilation aborted at /usr/local/bin/unique_headers.pl line 31.
"
After cloning the repository and using the "perl Makefile.PL PREFIX=$HOME" I get the following errors:

Warning: prerequisite Class::XSAccessor 0 not found.
Warning: prerequisite Inline 0 not found.
Warning: prerequisite List::MoreUtils 0 not found.
Warning: prerequisite Moose 0 not found.
Warning: prerequisite MooseX::NonMoose 0 not found.
Warning: prerequisite Readonly 0 not found.
Warning: prerequisite Term::ProgressBar 0 not found.
Warning: prerequisite namespace::autoclean 0 not found.
Generating a Unix-style Makefile
Writing Makefile for fasta_utilities
Writing MYMETA.yml and MYMETA.json

After that, make and make install look fine.
"/usr/bin/perl" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/absolute_coordinates.pl
cp scripts/wrap.pl blib/script/wrap.pl
"/usr/bin/perl" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/wrap.pl
Manifying 40 pod documents
Manifying 22 pod documents

make install
"Appending installation info to /home/moraisd/lib/x86_64-linux-gnu/perl/5.22.1/perllocal.pod"

But I still get the error.

Could you show me how to solve it? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.