yubioinfo / readfq Goto Github PK
View Code? Open in Web Editor NEWThis project forked from lh3/readfq
Fast multi-line FASTA/Q reader in several programming languages
This project forked from lh3/readfq
Fast multi-line FASTA/Q reader in several programming languages
Readfq is a collection of routines for parsing the FASTA/FASTQ format. It seamlessly parses both FASTA and multi-line FASTQ with a simple interface. Readfq is first implemented in a single C header file and then ported to Lua, Perl and Python as a single function less than 50 lines. For users of scripting languages, I encourage to copy-and-paste the function instead of using readfq as a library. It is always good to avoid unnecessary library dependencies. Readfq also strives for efficiency. The C implementation is among the fastest (if not the fastest). The Python and Perl implementations are several to tens of times faster than the official Bio* implementations. If you can speed up readfq further, please let me know. I am not good at optimizing programs in scripting languages. Thank you. As to licensing, the C implementation is distributed under the MIT license. Implementations in other languages are released without a license. Just copy and paste. You do not need to acknowledge me. The following shows a brief example for each programming language: # Perl my @aux = undef; # this is for keeping intermediate data while (my ($name, $seq, $qual) = readfq(\*STDIN, \@aux)) { print "$seq\n"; } # Python: generator function for name, seq, qual in readfq(sys.stdin): print seq -- Lua: closure for name, seq, qual in readfq(io.stdin) do print seq end /* Go */ package main import ( "fmt" "bufio" "github.com/drio/drio.go/bio/fasta" ) func main() { var fqr fasta.FqReader fqr.Reader = bufio.NewReader(os.Stdin) for r, done := fqr.Iter(); !done; r, done = fqr.Iter() { fmt.Println(r.Seq) } } /* C */ #include <zlib.h> #include <stdio.h> #include "kseq.h" KSEQ_INIT(gzFile, gzread) int main() { gzFile fp; kseq_t *seq; fp = gzdopen(fileno(stdin), "r"); seq = kseq_init(fp); while (kseq_read(seq) >= 0) puts(seq->seq.s); kseq_destroy(seq); gzclose(fp); return 0; } Some naive benchmarks. To convert a FASTQ containing 25 million 100bp reads to FASTA, FASTX-Toolkit (parsing 4-line FASTQ only) takes 325.0 CPU seconds and EMBOSS' seqret 247.8 seconds. My seqtk, which uses the kseq.h library, finishes the task in 24.6 seconds, 10X faster. For retrieving 25k sequences by name from the same FASTQ, BioPython takes 963 seconds, while readfq.py takes 136 seconds; BioPerl takes more than 40 minutes (killed), while readfq.pl 273 seconds. Seqtk takes 29 seconds.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.