Comments (6)
You have the right idea catting to stdin. I added this step to the tutorial to clarify, but this could certainly be more elegant, maybe with a flag that pools all inputs into one sketch (possibly a behavior of the read flag -r
).
If you want to try avoiding zcat
, you can cat the gz files directly; mash
will inflate each one in-stream.
from mash.
In latest source the -r
flag will combine all input files and allows filling the empty fields with -I
and -C
.
from mash.
I do seem to have this persistent core dump:
zcat SRR1262647_1.fastq.gz | ../mash-Linux64-v1.1/mash sketch -k 21 -r -
Sketching from stdin...
Segmentation fault
Same with cat:
cat SRR1262647_1.fastq.gz | ../mash-Linux64-v1.1/mash sketch -k 21 -r -
Sketching from stdin...
Segmentation fault
Works fine without piping:
../mash-Linux64-v1.1/mash sketch -k 21 -r SRR1262647_1.fastq.gz
Sketching SRR1262647_1.fastq.gz...
Estimated genome size: 7.01971e+08
Estimated coverage: 2.942
Writing to SRR1262647_1.fastq.gz.msh...
Any ideas?
from mash.
Looks look stdin input was broken in 1.1. A fix is now in the latest source and will be included in the next release.
from mash.
Should be fixed in v1.1.1.
from mash.
Hello,
cat sample_1.fq.gz sample_2.fq.gz | mash sketch -k 21 -s 10000 -r - -o sample
mash info sample.msh
output:
Header:
Hash function (seed): MurmurHash3_x64_128 (42)
K-mer size: 21 (64-bit hashes)
Alphabet: ACGT (canonical)
Target min-hashes per sketch: 10000
Sketches: 1
Sketches:
[Hashes] [Length] [ID] [Comment]
10000 815194870 - -
The ID is empty.
If we have many samples pair-ended fastq files, and get sketch for each sample, then paste all into a single file, then mash dist it, finally all ID is empty.
We can do bellow:
cat sample_1.fq.gz sample_2.fq.gz > sample.fq.gz
mash sketch -k 21 -s 10000 sample.fq.gz -o sample
so can avoid ID issue, but couldn't enjoy the pleasure of unix stream pipeline : )
Thanks for the author
Such a great and creative tool!
from mash.
Related Issues (20)
- Phylogenetic Tree HOT 3
- mash sketch outputs reference.msh with only one row HOT 1
- Mash screen winner-take-all and multiple best number of hash hits HOT 1
- how many genomes can the mash deal with?
- Update RefSeq? HOT 2
- output the containment sequence HOT 3
- Sorting on mash-distance HOT 1
- Sketching paired- and single-end reads together
- Inconsistent number of matching hashes HOT 1
- Are Mash distances always symmetric?
- Missing include limits
- Mash dist -p, wrong number of threads (bug?)
- m1 ARM support?
- Cannot update Mash from version 1.1 to version 2
- -M for weighted kmer
- Can Mash accurately classify subspecies? HOT 1
- Doubts when using mash on multiple assemblies HOT 1
- MASH distance for AA
- Explanation of the .msh format HOT 2
- The release version of refseq.genomes.k21.s1000.msh
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mash.