Giter VIP home page Giter VIP logo

mhap's People

Contributors

konstantinberlin avatar red54 avatar skoren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mhap's Issues

Issues in help output

Error: no input or process files specified
Usage 1 (direct execution): MHAP -s<fasta/dat from/self file> [-q<fasta/dat to file>] [-f<kmer filter list, must be sorted>]
Usage 2 (generate precomputed binaries): MHAP -p<directory of fasta files> -q <output directory> [-f<kmer filter list, must be sorted>]
Options: 
     -k [int merSize], default: 16
      --memory [do not store kmers in memory]
      --num-hashes [int # hashes], default: 1024
      --min-store-length [int # of minimum sequence length that is hashed], default: 0
      --threshold [int threshold for % matching minimums], default: 0.05

The type is int but the default is double 0.05.

      --max-shift [double fraction of the overlap size where shift in k-mer match is still considered valid], default: 0.2
      --num-min-matches [int # hashes that maches before performing local alignment], default: 3

maches should be matches.

      --num-threads [int # threads to use for computation], default (2 x #cores): 8
      --subsequence-size [depricated, int size of maximum minhashed sequence], default: 100000

depricated should be deprecated

      --no-self [do not compute results to self], default: false
      --store-full-id [use full sequence id rather than order in file], default: false
      --threshold [int threshold for % matching minimums], default: 0.05
      --max-shift [int # max sequence shift allowed for a valid kmer relative to median value], default: 0.2

These two options are duplicated above.
The types are int but the default values are double.

Output format

It is not very clear how what output values actually represent.

I've found in docs that a record looks like
[A ID] [B ID] [Jaccard score] [# shared min-mers] [0=A fwd, 1=A rc] [A start] [A end] [A length] [0=B fwd, 1=B rc] [B start] [B end] [B length] so I have few questions.

  • Are end values inclusive or exclusive?
  • If overlap looks like
--------       A fwd
     --------  B rc

what would be approx values for read B?
Would start be 7 and end 10 or it would be 0 and 3, respectively?

I am sorry if it looks very obvious, I am coming from AMOS world where everything goes crazy.

multiple alignments to reference

Hello,
Just out of curiosity, when i use MHAP to map SMRTseq reads to a reference (which it was not designed for, i know), how is MHAP dealing with reads which might align to multiple locations?
I cut the reference into 40 kb pieces.
Is MHAP is reporting:
a) the best alignment (above some threshold) in each of those 40kb pieces or
b) is it only reporting best alignment over all of the 40 kb pieces (best hit over the whole genome)?

I would like to know because if b) is not the case it would be easy to implement. Basically take the output from a) and look for the best hit for each read?

Thank you,
Michel

N-gram size bigger than string length

Hi,

I used MHAP to compute the overlaps between contigs and pacbio long reads. But an "N-gram size bigger than string length" exception occurred.

Here is the command to run MHAP:

java -Xmx40g -jar $HOME/MHAP/target/mhap-2.0.jar -s asm.newbler454.fasta -q lr4.fa --no-self >mhap.out 2> mhap.err
where asm.newbler454.fasta is the assembled contig file and lr4.fa is the pacbio long read file.

However, there is nothing in mhap.out

Following is the err info stored in mhap.err:

`Running with these settings:
--alignment = false
--alignment-offset = -0.53
--alignment-score = 1.0E-6
--filter-threshold = 1.0E-5
--help = false
--max-shift = 0.2
--min-store-length = 0
--nanopore = false
--no-self = true
--num-hashes = 512
--num-min-matches = 3
--num-threads = 24
--ordered-kmer-size = 12
--ordered-sketch-size = 1536
--pacbio-fast = false
--pacbio-sensitive = false
--store-full-id = false
--threshold = 0.78
--version = false
--weighted = false
-f =
-h = false
-k = 16
-p =
-q = lr4.fa
-s = asm.newbler454.fasta

Processing files for storage in reverse index...
Stored 186 sequences in the index.
Processed 186 unique sequences (fwd and rev).
Time (s) to read and hash from file: 4.647191542000001
Opened fasta file /homec/liweilong/data/Ecoli/lr4.fa.
Exception in thread "pool-3-thread-22" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-8" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-14" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-1" Exception in thread "pool-3-thread-3" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-6" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-13" Exception in thread "pool-3-thread-24" Exception in thread "pool-3-thread-12" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-15" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-4" Exception in thread "pool-3-thread-9" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-18" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-23" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-11" Exception in thread "pool-3-thread-17" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-7" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-10" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-16" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-20" Exception in thread "pool-3-thread-21" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-19" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-2" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-3-thread-5" edu.umd.marbl.mhap.sketch.SketchRuntimeException: N-gram size bigger than string length.
at edu.umd.marbl.mhap.sketch.MinHashSketch.computeNgramMinHashesWeighted(MinHashSketch.java:57)
at edu.umd.marbl.mhap.sketch.MinHashSketch.(MinHashSketch.java:199)
at edu.umd.marbl.mhap.impl.SequenceSketch.(SequenceSketch.java:107)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.getSketch(SequenceSketchStreamer.java:231)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.enqueue(SequenceSketchStreamer.java:129)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.dequeue(SequenceSketchStreamer.java:114)
at edu.umd.marbl.mhap.impl.AbstractMatchSearch$3.run(AbstractMatchSearch.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Processed 1080 to sequences.
Time (s) to score, hash to-file, and output: 0.802685285
Total scoring time (s): 0.805969366
Total time (s): 5.4533288760000005
MinHash search time (s): 0.225948808
Total matches found: 0
Average number of matches per lookup: 0.0
Average number of table elements processed per lookup: 5.187962962962963
Average number of table elements processed per match: Infinity
Average % of hashed sequences hit per lookup: 0.6635802469135803
Average % of hashed sequences hit that are matches: 0.0
Average % of hashed sequences fully compared that are matches: 0.0

`

Could you help me with this please?

Unkown Source Exception

Hello,
I try to map PacBio reads against a reference.
Unfortunately it always terminates with Unknown Source Exception.
Could you give me details how to resolve that?

Thank you!
Michel

/data/users/mmoser/MHAP/jdk1.8.0_20/bin/java -jar mhap-0.1.jar -s ../../../Petunia_correct/PacBio-gDNA2013/SMRTcells/m130907_000643_42182_c100579332550000001823097501191470_s1_p0.fasta -f ../../../HGAPmt/N_tabacum_mtdna.fasta 
Running with input fasta: ../../../Petunia_correct/PacBio-gDNA2013/SMRTcells/m130907_000643_42182_c100579332550000001823097501191470_s1_p0.fasta
Running with process directory: null
Running with to directory or file: null
Running with kmer filter file: ../../../HGAPmt/N_tabacum_mtdna.fasta
kmer size:  16
kmer filter percent cutoff: 1.0E-5
num hashed words:   1024
num min matches:    3
min hashed seq length:  0
subsequence size:   100000
max shift:  0.2
threshold:  0.05
number of threads:  24
store full sequence ids:    false
compute alignment to self of -s file:   true
Reading in filter file ../../../HGAPmt/N_tabacum_mtdna.fasta.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.umd.marbl.mhap.utils.Utils.createKmerFilter(Unknown Source)
    at edu.umd.marbl.mhap.main.MhapMain.main(Unknown Source)

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/compress

I used Pre-compiled jar file. When i run "java -Xmx70g -server -jar mhap-1.6.jar -s Ery_PacBio.full.fasta --pacbio-sensitive --max-shift 0.1 --num-threads 24". It happened that. Need i to compiled from source code?

Processing files for storage in reverse index...
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream
at edu.umd.marbl.mhap.impl.FastaData.(Unknown Source)
at edu.umd.marbl.mhap.impl.SequenceSketchStreamer.(Unknown Source)
at edu.umd.marbl.mhap.main.MhapMain.getSequenceHashStreamer(Unknown Source)
at edu.umd.marbl.mhap.main.MhapMain.computeMain(Unknown Source)
at edu.umd.marbl.mhap.main.MhapMain.main(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more

Unsupported major.minor version 52.0

Hello,
I ran into Unsupported major.minor version 52.0 while creating a bioconda package.

What could cause this problem with the 2.1.1 release? Do I require Oracle's Java?

Thank you in advance.

Mic

BUILD FAILED

On OS X Yosemite with java version "1.8.0_60", javac -version
javac 1.6.0_65 and Apache Ant(TM) version 1.9.6 compiled on June 29 2015.

I have not found the compiler output for details (if you write me, where can I find it, I can add it).
log:

Buildfile: /Users/kjaron/Documents/PacBio/MHAP/MHAP/build.xml

clean:
[delete] Deleting directory /Users/kjaron/Documents/PacBio/MHAP/MHAP/target

init:
[echo] -------- MHAP 1.6 --------
[mkdir] Created dir: /Users/kjaron/Documents/PacBio/MHAP/MHAP/target
[mkdir] Created dir: /Users/kjaron/Documents/PacBio/MHAP/MHAP/target/classes
[mkdir] Created dir: /Users/kjaron/Documents/PacBio/MHAP/MHAP/target/classes/bin
[mkdir] Created dir: /Users/kjaron/Documents/PacBio/MHAP/MHAP/target/classes/properties
[mkdir] Created dir: /Users/kjaron/Documents/PacBio/MHAP/MHAP/target/test-classes
[copy] Copying 1 file to /Users/kjaron/Documents/PacBio/MHAP/MHAP/target/classes

copy-lib:
[mkdir] Created dir: /Users/kjaron/Documents/PacBio/MHAP/MHAP/target/lib
[copy] Copying 4 files to /Users/kjaron/Documents/PacBio/MHAP/MHAP/target/lib

buildinfo:
[propertyfile] Creating new property file: /Users/kjaron/Documents/PacBio/MHAP/MHAP/target/classes/properties/mhap.properties

compile:
[javac] Compiling 56 source files to /Users/kjaron/Documents/PacBio/MHAP/MHAP/target/classes
[javac] javac: invalid target release: 1.8
[javac] Usage: javac
[javac] use -help for a list of possible options

BUILD FAILED
/Users/kjaron/Documents/PacBio/MHAP/MHAP/build.xml:155: Compile failed; see the compiler error output for details.

Total time: 0 seconds

Installation problem.

I tried installing MHAP. But once i mount into the folder and run maven install it shows the following command. Could you please help.

saad@Mercenary:~/MHAP$ maven install
No command 'maven' found, did you mean:
Command 'aven' from package 'survex-aven' (universe)
maven: command not found

out of memory error

hi, thanks for your implementation.
I run the jar file with below command for finding yeast reads' overlaps:
java -Xmx32g -server -jar mhap-2.1.1.jar -s yeast_filtered.fa
but after some minutes, this error occurred:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.ints.Int2ObjectOpenHashMap.(Int2ObjectOpe nHashMap.java:126)
at it.unimi.dsi.fastutil.ints.Int2ObjectOpenHashMap.(Int2ObjectOpe nHashMap.java:136)
at edu.umd.marbl.mhap.impl.MinHashSearch.(MinHashSearch.java:89)
at edu.umd.marbl.mhap.main.MhapMain.getMatchSearch(MhapMain.java:550)
at edu.umd.marbl.mhap.main.MhapMain.computeMain(MhapMain.java:454)
at edu.umd.marbl.mhap.main.MhapMain.main(MhapMain.java:313)

this error occurred after 460000 sequences loaded and processed ( when this line printed in the output: Current # sequences loaded and processed from file: 460000... )

  1. may you help me to find the reason for this error?
  2. am I choosing the right command for finding overlaps?
  3. how can I solve that error?

thanks.

MHAP running time

Hello,
I would like to ask if is there an issue to estimate running time of MHAP Celera assembler, in a single machine.I have 14x 2Gbases of Pacbio reads and a machine with 80CPU and 1Toctets of memory.
Does the ovlRefBlockSize parameter has an impact on speed ?

Thank you

Problem with jellyfish step

Hello!
I'm using MHAP as part of the PBcR pipeline, so I'm not sure if this is the place to ask, but thought I'd try anyway. I'm testing a range of parameters for MHAP and certain combos seem to fail early on during the jellyfish step, while others pass through it just fine. Here is a typical output:

Running with 40.0000115076923X (for genome size 130000000) of dmel_pacbio sequences (5200001496 bp).
Correcting with 121X sequences (15743312583 bp).
----------------------------------------START Fri Jul 17 21:34:36 2015
/gpfs/fs1/sfw/wgs/8.3/Linux-amd64/bin/jellyfish count -m 10 -s 120000000 -t 24 -o /scratch/dkhost/lab_shared/dmel_MHAP_array/1_10_1000.params//tempdmel_pacbio/asm.mers /scratch/dkhost/bergman_pacbio_reads/pacbio_reads_vs_celera25X_filtered+BLASR/data/filtered_subreads.fastq
Failed to open input file '/scratch/dkhost/lab_shared/dmel_MHAP_array/1_10_1000.params//tempdmel_pacbio/asm.mers1021'
----------------------------------------END Sun Jul 19 04:37:28 2015 (111772 seconds)
Failed to execute /gpfs/fs1/sfw/wgs/8.3/Linux-amd64/bin/jellyfish count -m 10 -s 120000000 -t 24 -o /scratch/dkhost/lab_shared/dmel_MHAP_array/1_10_1000.params//tempdmel_pacbio/asm.mers /scratch/dkhost/bergman_pacbio_reads/pacbio_reads_vs_celera25X_filtered+BLASR/data/filtered_subreads.fastq

As far as I can tell, that file does exist, and again other combos run ok. It seems to be a problem with small kmers or much larger kmers, perhaps it is a memory issue? This run had kmer=10, and sketch size=1000, I can give you the whole spec file if needed. Any suggestions you have would be greatly appreciated!
Best,
Emerson

make all subreads(in one ZMW) to one ccs sequence before use MHAP

hi,
When PBcR map all filtered_subread to SEEDS longreads with MHAP, the SEEDS usually still contain other subread which are in the same ZMW.
So what if roll all subreads which are in the same ZMW to one CCS sequence for SEEDS,then combine CCS and other seeds as SEEDS,finally use MHAP to do overlap?

Sequence ordinal in results is i+1 in range of i 1 to N

Version 2.1.1

The ordinals in the first column of MHAP output seem to be consistently 1 too high, assuming they are supposed to represent the query sequence positions in the pacbio input file in the range 1 to N.

Example: make a query file "just_two.fasta" which contains two pacbio sequences, each of which matches somewhere in a single target sequence "target.fasta".

Align the query reads to the target sequence:

/m120317_230406_00121_c100309752550000001523013008061277_s1_p0.fasta >just_two.fasta
/usr/lib/jvm/java-1.8.0-openjdk.x86_64/bin/java -jar ~/src/MHAP/target/mhap-2.1.1.jar \
    -s /tmp/target.fasta -q  just_two.fasta \
    --num-threads 2 2>/dev/null

Results are:

3 1 0.116346 27.000000 0 69 692 1043 1 38 593 6655
2 1 0.120035 31.000000 0 2130 2842 2872 0 0 645 6655

The first two values should have been 2 and 1.

This was initially noticed with much larger numbers, where the pacbio reads were in positions 69233 and 69234 but were numbered as 69234 and 69235.

version 2.1?

I noticed that canu is bundling a version of mhap labeled 2.1, but there's no such release here. Could you please update the repository or push the corresponding tag?

Many thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.