solomonik / jaccard-ctf Goto Github PK
View Code? Open in Web Editor NEWCyclops code for computing Jaccard similarity matrix for metagenome analysis
Cyclops code for computing Jaccard similarity matrix for metagenome analysis
Hello again,
I'm getting pretty fairly long runtimes with the fasta_reader. Think it might have to do with tasks not being evenly allocated. Below is a run with ten processes. Every process but the last two finish almost immediately. The dataset is two reference genomes in FASTA format (3.1 MB) and the output k-mer files look fine.
srun -n 10 ~/jaccard-ctf/fasta_reader -lfile input_list.txt -k 21 -infolderPath genomes/test/ -outfolderPath kmerFiles/
Total time taken: 0.0390079
Total time taken: 0.039084
Total time taken: 0.039016
Total time taken: 0.0392361
Total time taken: 0.039237
Total time taken: 0.039309
Total time taken: 0.0393519
Total time taken: 0.0394399
Total time taken: 115.406
Total time taken: 152.178
This is the same pattern no matter how many processes I run. I also tried 68 processes on a 20 file/10 GB dataset, but once again only the last two ranks had any significant runtime.
One key difference you might have noticed is that I'm using srun. Due to some restrictions on my server, I built ctf/jaccard/fasta_reader with cray-mpich/7.7.10. The support specialists I've spoken with said srun should act the same as mpi-run under my current build. Regardless, do you think cray-mpich might have to do with fasta_reader limiting itself to two processes?
Thanks,
Brett
Hi, I am trying to install this tool, but not able to. I followed the instruction in README that says to use ./configure --no-dynamic && make
, but there is no ./configure file. Please advise. Thank you.
Hello,
I'm trying to run ./jaccard
on a collection of metagenomes. What is the correct input configuration for a directory of FASTA files and how do I specify k-mer length if my reads vary in length?
I ran a test on two small FASTA files with the following:
mpirun -np 1 ~/jaccard-ctf/jaccard -lfile input_list.txt -f test_data/ -m 100000 -n 2 -nbatch 1
input_list.txt
:
SRS011061.fa
SRS011086.fa
ls test_data
:
SRS011061.fa
SRS011086.fa
However, I don't see a jaccard matrix in the output:
read k-mers, batchNo: 0 non_zero_rows: 0 time: 0.06 masks created with zero rows removed if compression is enabled, batchNo: 0 time: 0.00 J constructed, batchNo: 0 J.nnz_tot: 0 J.nrow: 3125 J write time: 0.13 Batch complete, batchNo: 0 time for jaccard_acc(): 0.04 S matrix computed for the specified input dataset
Am I doing this correctly? Is 0.04 the jaccard index?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.