philres / catfishq Goto Github PK
View Code? Open in Web Editor NEWCat FASTQ files
License: MIT License
Cat FASTQ files
License: MIT License
-d/--deduplicate
Don't print reads that have been seen before.
Current solution is slow
Reads with empty sequence field in fastq files are interpreted as fasta entries in output:
@ReadID1
$
+
$
@ReadID2
is outputted like:
ReadID1
$
@ReadID2
Catfishq returns a timestamp parsing error when using --print-start-time
command on FASTQ files generated by Guppy version โ5.0.16+b9fcd7bโ, likely due to a change in the timestamp format.
Python 3.8, catfishq 1.1.5, pysam 0.17.0
catfishq --print-start-time -r fastq_pass/
Traceback (most recent call last):
File "/miniconda3/bin/catfishq", line 8, in <module>
sys.exit(main())
File "/miniconda3/lib/python3.8/site-packages/catfishq/cat_fastq.py", line 351, in main
min_start_time=get_start_time(args.FASTQ,args.RECURSIVE)
File "/miniconda3/lib/python3.8/site-packages/catfishq/cat_fastq.py", line 264, in get_start_time
min_start_time=compare_start_time(entry.comment,min_start_time)
File "/miniconda3/lib/python3.8/site-packages/catfishq/cat_fastq.py", line 206, in compare_start_time
start_time = datetime.strptime(start_time_str,'%Y-%m-%dT%H:%M:%SZ')
File "/miniconda3/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/miniconda3/lib/python3.8/_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data '2021-10-19T04:24:16.235874+01:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
Hello,
The .tar.gz does not offer any license information. Please consider adding the LICENSE file to the archive and to also specify it in setup.py. Ideally, every source file would mention it.
Many thanks!
Steffen (while packaging catfishq for Debian)
I get the following error on running the following command on catfishq 1.1.15
catfishq -r flowcell_xyz/ --log DEBUG > catf.fastq
Searching flowcell_xyz/ for FASTQ files
Found 622 files
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/bin/catfishq", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/catfishq/cat_fastq.py", line 320, in main
format_fq(
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/catfishq/cat_fastq.py", line 285, in format_fq
for entry in parse_fastqs(
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/catfishq/cat_fastq.py", line 201, in parse_fastqs
with pysam.FastxFile(filename) as fh:
File "pysam/cfaidx.pyx", line 456, in pysam.cfaidx.FastxFile.__cinit__
File "pysam/cfaidx.pyx", line 478, in pysam.cfaidx.FastxFile._open
File "pysam/cutils.pyx", line 107, in pysam.cutils.encode_filename
TypeError: Argument must be string or unicode.
Thank you.
Allows carrying over info into BAM file when running minimap2 -y
When running with --max-sequencing-time without --start-time-min the default behaviour is to ignore --max-sequencing-time.
Suggest setting 'start-time-min' to true when --x-sequencing-time is used without --start-time.
...or figure out how to make it faster in python
catfishq/catfishq/cat_fastq.py
Line 49 in 4c42039
Possible alternatives are click or argh (?)
Is pyfastx faster than pysam for FASTQ parsing?
catfishq/catfishq/cat_fastq.py
Line 36 in 4c42039
Check if pysam returns numpy arrays. If it does use numpy to compute probabilities from phred scores more efficiently
Alternative: cython implementation of q-score computation
Line 19 in 4c42039
catfishq/catfishq/cat_fastq.py
Line 241 in 4c42039
Is it faster? Also pysam is a pretty massive dependency.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.