Giter VIP home page Giter VIP logo

Comments (8)

lh3 avatar lh3 commented on June 30, 2024 1

I have fixed the bug via 95eb1de. Thanks a lot for your example. It would be almost impossible without proper test cases.

By the way, usually I wouldn't recommend to perform alignment (option -c) in the overlapping mode. First, this is much slower. Second, it introduces false overlaps. On pacbio C. elegans reads, doing alignment leads to a few more misassemblies.

I am closing this issue. If you the see same problem again, feel free to reopen it.

from minimap2.

gringer avatar gringer commented on June 30, 2024

The display code from format.c:

   195  void mm_write_paf(kstring_t *s, const mm_idx_t *mi, const mm_bseq1_t *t, const mm_reg1_t *r, void *km, int opt_flag)
   196  {
   197          s->l = 0;
   198          mm_sprintf_lite(s, "%s\t%d\t%d\t%d\t%c\t", t->name, t->l_seq, r->qs, r->qe, "+-"[r->rev]);
   199          if (mi->seq[r->rid].name) mm_sprintf_lite(s, "%s", mi->seq[r->rid].name);
   200          else mm_sprintf_lite(s, "%d", r->rid);
   201          mm_sprintf_lite(s, "\t%d\t%d\t%d", mi->seq[r->rid].len, r->rs, r->re);
   202          if (r->p) mm_sprintf_lite(s, "\t%d\t%d", r->p->blen - r->p->n_ambi - r->p->n_diff, r->p->blen);
   203          else mm_sprintf_lite(s, "\t%d\t%d", r->fuzzy_mlen, r->fuzzy_blen);
   204          mm_sprintf_lite(s, "\t%d", r->mapq);
   205          write_tags(s, r);
   206          if (r->p && (opt_flag & MM_F_OUT_CG)) {
   207                  uint32_t k;
   208                  mm_sprintf_lite(s, "\tcg:Z:");
   209                  for (k = 0; k < r->p->n_cigar; ++k)
   210                          mm_sprintf_lite(s, "%d%c", r->p->cigar[k]>>4, "MIDN"[r->p->cigar[k]&0xf]);
   211          }
   212          if (r->p && (opt_flag & MM_F_OUT_CS))
   213                  write_cs(km, s, mi, t, r);
   214  }

It seems a bit odd that it's mi->seq[r->rid].len instead of r->l_seq, when r->qs and r->qe are used.

from minimap2.

lh3 avatar lh3 commented on June 30, 2024

This does seem like a minimap2 bug. However, I need your help to move further. Could you add the following lines to mm_write_paf():

if (!(r->rs < r->re && r->re <= mi->seq[r-rid].len)) {
  fprintf(stderr, "%s <=> %s\n", t->name, mi->seq[r->rid].name);
  abort();
}

This essentially raises an assertion failure and prints out violating sequence pairs if coordinates are incorrect. By the way, there is no mm_reg1_t::l_seq. mi->seq[r->rid].len gets the reference sequence length. That part of code is correct.

Thanks!

from minimap2.

gringer avatar gringer commented on June 30, 2024

Output from canu (from a different failed run):

$ cat mmap.000005.out 
Running job 5 based on command line options.
Fetch blocks/000003.fasta
Fetch blocks/000004.fasta
[M::main::2.008*1.00] loaded/built the index for 192000 target sequence(s)
[M::mm_mapopt_update::2.603*1.00] mid_occ = 126
[M::mm_idx_stat] kmer size: 17; skip: 11; is_HPC: 0; #seq: 192000
[M::mm_idx_stat::2.945*1.00] distinct minimizers: 21028426 (90.26% are singletons); average occurrences: 1.438; average spacing: 6.098
370985 <=> 192001
Aborted
[M::main::1.864*1.00] loaded/built the index for 192000 target sequence(s)
[M::mm_mapopt_update::2.369*1.00] mid_occ = 126
[M::mm_idx_stat] kmer size: 17; skip: 11; is_HPC: 0; #seq: 192000
[M::mm_idx_stat::2.716*1.00] distinct minimizers: 21028426 (90.26% are singletons); average occurrences: 1.438; average spacing: 6.098
[M::worker_pipeline::6.438*5.90] mapped 192000 sequences
[M::main] Version: 2.2-r424-dirty
[M::main] CMD: /home/gringer/install/canu/canu-1.6/Linux-amd64/bin/minimap2 -x ava-ont -c -k17 -w11 -t 12 ./blocks/000002.mmi queries/000005/000003.fasta
[M::main] Real time: 6.461 sec; CPU: 38.024 sec
[M::main::1.520*1.00] loaded/built the index for 192000 target sequence(s)
[M::mm_mapopt_update::2.037*1.00] mid_occ = 126
[M::mm_idx_stat] kmer size: 17; skip: 11; is_HPC: 0; #seq: 192000
[M::mm_idx_stat::2.412*1.00] distinct minimizers: 21028426 (90.26% are singletons); average occurrences: 1.438; average spacing: 6.098
[M::worker_pipeline::6.041*6.39] mapped 192000 sequences
[M::main] Version: 2.2-r424-dirty
[M::main] CMD: /home/gringer/install/canu/canu-1.6/Linux-amd64/bin/minimap2 -x ava-ont -c -k17 -w11 -t 12 ./blocks/000002.mmi queries/000005/000004.fasta
[M::main] Real time: 6.087 sec; CPU: 38.676 sec
mv: cannot stat './results/000005.mmap.counts': No such file or directory

Edit: More interesting is that this modification of minimap2 allows canu to continue because the offending lines don't appear in the output (even though the output is incomplete...).

from minimap2.

lh3 avatar lh3 commented on June 30, 2024

Could you extract the sequence pair "370985" and "192001", or could you send me the two blocks? I have run minimap2 on some pacbio reads, but could not reproduce the issue on my dataset. Without the sequences that trigger the bug, it is quite difficult to fix it. Thanks.

from minimap2.

gringer avatar gringer commented on June 30, 2024

This is weird. It's not failing when I run with just those two sequences, but it is for the whole block (both indexed and non-indexed). Anyway, here are the two sequences:

>370985
TGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTAATTCGTTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGTGTGTGTTGGTGGTGTGTGTGTGTGG
AGTGTGTGTGTGTGTGGAGTGTGTGTGTGTGTTGGTGTGTGGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTCGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGGAGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGAGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTTGGTGATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGAG
TGTGTGTGTGTGTGTGGTGTGTGTGTGTGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGT
GTGTGGTGATCAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGATCATTGATGATCAATTGATGATTGATGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTTTGGTGATGTTGTTGTTGGTATTGTTATTGTTATGTTGTTGTTGTTGTTGTGTTATT
GTTGGTATTATTATTGTTATTCAGTATTGTTATTGTGTGTATTATTCATTGTTGATGTGC
TATTATTGTTATTGTTATTGTTATTGTTGTTATTATTTATTGTTCCGCTTCCTTCCTTTG
TACTTGTTATTCTTATTCCATTTCTTCCCTTCTTCCAGCCCCATGCTGCCACCTTATTGT
AATACCCATTAACCAACCTTCGCCTTCGCCCAACTACTGCCTTCCTTCGCCTTAACCTTC
CTTAGCCTTAGCCAGCCTTCCAACCTTCTTAACTGGCCAGCCTTAACCCAGCCCACGCCC
ACCTTCTTCCTTCTTCGCTAACCTTCGCCTTGTGCCTTAACCTTTAACCTTGTCCAGCCA
GCCAGCCAGCCAGCCTTCTTCTTCCTTCCTTCTTCCGGCCTTCGCTATTGGCCTTGTGTG
TTATTGTGTATTTATTGTATGTGTATTATTGTGCCTTCTTCTTATGTGTATTATTGTATT
ATTGTATTGTATGTTGTATTATTGTGTGTATTGGCCTTCTTCTTCTTCTTCTTTACAACC
CCACCCTTCTTCCCTTTATTGTACCCACACCCTTCCCATGCCACCTTTCTTATTGTGTTA
ACCTTGTATTCCTTGTGTGTTATTGTGTTATTCTTATTGTGGCCATTATTGTGTTATTAT
TTATTATTTTATTGTATTCTATTGTGTGTATTGTGTGTGTGTGTGTGTGTATTTGTTATT
GCTTGACTGTATTGTGTGCCATTGTGTGTGACCTTACCTTATGTTCCAGTACTAGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGCTGTGTGTGTGTGTGTGTGTGTGTGTGTATTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTTGTGTTATTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTAGTATTGTGTGTGTGTGTGTGTGTGTGTGTGTTATTATTGTTATTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTTATTGTGTGTTATTATTATTGTGTGTGTATTG
TGTATTATTATTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGTTATTGGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGTTGGTGGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGGTGTG
TGTGTGTGTGTGTGTGTGTGTGTCAAGTGTGTGTGTGTGTGTGTGGTGTGTGTGTGTGTG
TGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGAAGTGTGTGTGTG
TGATCAGTGTGTCAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGGTGGGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTTG
>192001
TTGTTGTACTTCGTTCGGTTACATATTGCTGGCGTCTGCTTATCGAGCAAAAGGGGTTCC
AAGTCAGAGGTTCCTTTTCTGTTGGACAACGATGTTGCTGCCATTACGGCGGACCGGCTT
GGTGGGATGGCGCTGCAGGTCGGTTCGCTTCTGCGCCGTCTCCAGAGCGTCTTTCCGTTC
CCCGCGCGGAACGCGCTCGTGAACGCGCCACACGCCAGGGGCCATGCGACGAGGGCGGCC
GAAGGCTCTACATTGAGGCAACATCGGATGGCACTTCAAACGTCTTGGAAACTTGCTGAA
ATGATGTACCAAGGGTCCCAGCACGATGGTCTACACACATTCCAGACCCTTTCTTTCCTT
GAGCCGTCTGGAGGTGCAGCTGGAGCCCATCAGGAGACTCCCTGCAGGCAGAGAAGTCTG
TGCAGGTCCCAGAGAGGTCTGTGTACAGTGACAGGTATATCTTTGCAAAAGAATCTATTT
GAAAATGGCTCCCTCAGTGACATCAGGTGGCATCTATCAGGACTGGCACTCCTTTCTCCT
GCAGGGAGTTCCGCAAACCAGCTTTGTTACACGGCTTCATCTATCTCCAGGCTTCACCCA
GGTTTGTGGAGAGACTCTACCAGAGATGAAGAAGAAAAAGGACATTGAGCTGGCCTATCT
TCAACAGCCCCATAGTCAGCACGAAGACTGGTTTTATTAACAAGACTACCAAGCTCCGCT
GAAGCTCTGCAGCACGTGCCAGTGCTGGTGCTGGATCCACTGAGGCTTCCACTCTGAAAA
TGCCCGCCAGGCAGGGGAGAGCTCATGGGACAGGTGAACACCTTTATGAGGAACTTAACA
ATAAGGACGTGGCTGTGGTCTGGACTCTGACTTTCTCAGATCTGCGACTGCTCTTATGCA
CATCCCCTTTGGCCCCTCAGCCGCCCCCACACCTGGGTGTGTGCTGTCTTTATTACTGCA
TTGTTTTGTTTTTTAATCTCAAGCACCTTGAAATAAGAAAGCTGACTTTGCTATTTTTCT
AAAAAGATAGAGCGACAGGCAGAGGAACCTCTCTTGGAACCTCTCGGTTAAACACCCAAG
CAGACGCCAGCAATACGTAACGC

The entire indexed block can be found temporarily in my Dropbox code_scratch folder here. This index causes the issue when paired with the above highly-repetitive sequence.

from minimap2.

gringer avatar gringer commented on June 30, 2024

Looking at the fix, does this mean that all reverse mappings were given the wrong ID? I'm just wondering if I should remap my sequences with the fixed minimap2.

from minimap2.

lh3 avatar lh3 commented on June 30, 2024

It is inversion, not reverse.

from minimap2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.