Giter VIP home page Giter VIP logo

jaccard-ctf's People

Contributors

raghavendrak avatar solomonik avatar temprk1 avatar

Watchers

 avatar  avatar  avatar

jaccard-ctf's Issues

Running with a collection of FASTA files

Hello,
I'm trying to run ./jaccard on a collection of metagenomes. What is the correct input configuration for a directory of FASTA files and how do I specify k-mer length if my reads vary in length?

I ran a test on two small FASTA files with the following:
mpirun -np 1 ~/jaccard-ctf/jaccard -lfile input_list.txt -f test_data/ -m 100000 -n 2 -nbatch 1

input_list.txt:
SRS011061.fa
SRS011086.fa

ls test_data:
SRS011061.fa
SRS011086.fa

However, I don't see a jaccard matrix in the output:
read k-mers, batchNo: 0 non_zero_rows: 0 time: 0.06 masks created with zero rows removed if compression is enabled, batchNo: 0 time: 0.00 J constructed, batchNo: 0 J.nnz_tot: 0 J.nrow: 3125 J write time: 0.13 Batch complete, batchNo: 0 time for jaccard_acc(): 0.04 S matrix computed for the specified input dataset

Am I doing this correctly? Is 0.04 the jaccard index?

help with the installation

Hi, I am trying to install this tool, but not able to. I followed the instruction in README that says to use ./configure --no-dynamic && make, but there is no ./configure file. Please advise. Thank you.

Fasta Reader Strange MPI Scaling

Hello again,
I'm getting pretty fairly long runtimes with the fasta_reader. Think it might have to do with tasks not being evenly allocated. Below is a run with ten processes. Every process but the last two finish almost immediately. The dataset is two reference genomes in FASTA format (3.1 MB) and the output k-mer files look fine.

srun -n 10 ~/jaccard-ctf/fasta_reader -lfile input_list.txt -k 21 -infolderPath genomes/test/ -outfolderPath kmerFiles/
Total time taken: 0.0390079
Total time taken: 0.039084
Total time taken: 0.039016
Total time taken: 0.0392361
Total time taken: 0.039237
Total time taken: 0.039309
Total time taken: 0.0393519
Total time taken: 0.0394399
Total time taken: 115.406
Total time taken: 152.178

This is the same pattern no matter how many processes I run. I also tried 68 processes on a 20 file/10 GB dataset, but once again only the last two ranks had any significant runtime.

One key difference you might have noticed is that I'm using srun. Due to some restrictions on my server, I built ctf/jaccard/fasta_reader with cray-mpich/7.7.10. The support specialists I've spoken with said srun should act the same as mpi-run under my current build. Regardless, do you think cray-mpich might have to do with fasta_reader limiting itself to two processes?

Thanks,
Brett

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.