solomonik / jaccard-ctf Goto Github PK

Cyclops code for computing Jaccard similarity matrix for metagenome analysis

Makefile 1.63% C++ 97.72% C 0.65%

jaccard-ctf's Issues

Fasta Reader Strange MPI Scaling

Hello again,
I'm getting pretty fairly long runtimes with the fasta_reader. Think it might have to do with tasks not being evenly allocated. Below is a run with ten processes. Every process but the last two finish almost immediately. The dataset is two reference genomes in FASTA format (3.1 MB) and the output k-mer files look fine.

srun -n 10 ~/jaccard-ctf/fasta_reader -lfile input_list.txt -k 21 -infolderPath genomes/test/ -outfolderPath kmerFiles/
Total time taken: 0.0390079
Total time taken: 0.039084
Total time taken: 0.039016
Total time taken: 0.0392361
Total time taken: 0.039237
Total time taken: 0.039309
Total time taken: 0.0393519
Total time taken: 0.0394399
Total time taken: 115.406
Total time taken: 152.178

This is the same pattern no matter how many processes I run. I also tried 68 processes on a 20 file/10 GB dataset, but once again only the last two ranks had any significant runtime.

One key difference you might have noticed is that I'm using srun. Due to some restrictions on my server, I built ctf/jaccard/fasta_reader with cray-mpich/7.7.10. The support specialists I've spoken with said srun should act the same as mpi-run under my current build. Regardless, do you think cray-mpich might have to do with fasta_reader limiting itself to two processes?

Thanks,
Brett

help with the installation

Hi, I am trying to install this tool, but not able to. I followed the instruction in README that says to use ./configure --no-dynamic && make, but there is no ./configure file. Please advise. Thank you.

Running with a collection of FASTA files

Hello,
I'm trying to run ./jaccard on a collection of metagenomes. What is the correct input configuration for a directory of FASTA files and how do I specify k-mer length if my reads vary in length?

I ran a test on two small FASTA files with the following:
mpirun -np 1 ~/jaccard-ctf/jaccard -lfile input_list.txt -f test_data/ -m 100000 -n 2 -nbatch 1

input_list.txt:
SRS011061.fa
SRS011086.fa

ls test_data:
SRS011061.fa
SRS011086.fa

However, I don't see a jaccard matrix in the output:
read k-mers, batchNo: 0 non_zero_rows: 0 time: 0.06 masks created with zero rows removed if compression is enabled, batchNo: 0 time: 0.00 J constructed, batchNo: 0 J.nnz_tot: 0 J.nrow: 3125 J write time: 0.13 Batch complete, batchNo: 0 time for jaccard_acc(): 0.04 S matrix computed for the specified input dataset

Am I doing this correctly? Is 0.04 the jaccard index?

solomonik / jaccard-ctf Goto Github PK

jaccard-ctf's Issues

Fasta Reader Strange MPI Scaling

help with the installation

Running with a collection of FASTA files

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent