I have a salmon index which fails silently when used: <div class="snippet-clipboar

Confirmed with v0.6.0: <div class="snippet-clipboard-content notranslate position-

The indexing log shows nothing out of the ordinary: <div class="snippet-clipboard-

Silent failure while loading index about salmon HOT 18 CLOSED

combine-lab commented on June 10, 2024

Silent failure while loading index

from salmon.

Comments (18)

rob-p commented on June 10, 2024

Strange --- were there any complaints during index creation? Was the index created successfully? Since there's no core dump, the only thought is that I could try to re-create on a small sample (the reference plus a small set of reads).

from salmon.

mdshw5 commented on June 10, 2024

Confirmed with v0.6.0:

Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
# salmon (mapping-based) v0.6.0
# [ program ] => salmon
# [ command ] => quant
# [ index ] => { ... }
# [ libType ] => { IU }
# [ mates1 ] => { ... }
# [ mates2 ] => { ... }
# [ output ] => {... }
# [ threads ] => { 16 }
Logs will be written to ...
there is 1 lib
[2016-01-22 17:59:17.894] [jointLog] [info] parsing read library format
Loading 32-bit quasi index[2016-01-22 17:59:18.735] [stderrLog] [info] Loading Suffix Array
[2016-01-22 17:59:18.736] [stderrLog] [info] Loading Position Hash
[2016-01-22 17:59:18.731] [jointLog] [info] Loading Quasi index
[2016-01-22 18:00:59.879] [stderrLog] [info] Loading Transcript Info
[2016-01-22 18:01:25.157] [stderrLog] [info] Loading Rank-Select Bit Array
[2016-01-22 18:01:30.642] [stderrLog] [info] There were 552702 set bits in the bit a
[2016-01-22 18:01:31.487] [stderrLog] [info] Computing transcript lengths
[2016-01-22 18:01:31.491] [stderrLog] [info] Waiting to finish loading hash
Index contained 552702 targets
[2016-01-22 18:04:43.717] [jointLog] [info] done
[2016-01-22 18:04:43.717] [stderrLog] [info] Done loading index

I'll check the index creation logs, but didn't notice anything out of the ordinary...

from salmon.

rob-p commented on June 10, 2024

It seems there are a bunch of targets --- ~500k. That's not a problem, but does that sound right for this reference?

from salmon.

mdshw5 commented on June 10, 2024

Yes :)

from salmon.

rob-p commented on June 10, 2024

One more question --- what is the approximate size (in nucleotides) of the reference? If it's greater than ~2.14 billion, then it should be using the 64-bit index, which this one is not.

from salmon.

mdshw5 commented on June 10, 2024

That was going to a my next question!

from salmon.

rob-p commented on June 10, 2024

If it is a large (i.e. > 2^31 nucleotide reference) then it should trigger the 64-bit index automatically. If there's a failure to do that, it's a bug I have to fix in RapMap. Admittedly, I've not tried to map to many transcriptomes that large, so I'd be much obliged if you could provide me with an example to trigger that behavior :).

from salmon.

mdshw5 commented on June 10, 2024

The indexing log shows nothing out of the ordinary:

[2016-01-22 15:11:57.283] [jointLog] [info] building index
[2016-01-22 15:40:12.318] [jointLog] [info] done building index

There was actually a blank line at the very end of the transcriptome FASTA which I though might be related to #22, so I removed this line, re-indexed and have the same behavior. I'll check on the nucleotide size of the transcriptome now.

cc @jmerkin

from salmon.

mdshw5 commented on June 10, 2024

The nucleotide size is 1486025420, so we are using the correct bit depth. I'm checking the FASTA headers for strange characters that might cause a parsing issue. Anything that I should look for?

from salmon.

rob-p commented on June 10, 2024

Yes; that should definitely correctly be identified as 32-bit. The way the parser works is that it "chops" the header at the first whitepsace character. I can't think of anything that would cause failure during mapping (but bugs come from exactly the kind of thing you can't think of). Something that might cause an issue now that I think about it is a complete poly-A transcript. The indexer will attempt to clip poly-A tails (if a transcript ends with > 10 A's, then it will clip all of the trailing A's. If this causes the entire sequence to disappear, this might cause an issue. Also, I hadn't given deep consideration to what might happen if a transcript is shorter than the k-mer size (default 31) used for hashing --- so I might also check for very short transcripts.

from salmon.

mdshw5 commented on June 10, 2024

Thanks for the suggestions. I'm building the quasi index using RapMap now. If I get the same behavior I'll try to debug on my end before leaning more on you.

from salmon.

mdshw5 commented on June 10, 2024

rapmap works fine with this set of transcripts. indexing:

$ rapmap pseudoindex -k 31 -i /path/to/output -t /path/to/transcripts.fa
RapMap Indexer

[Step 1 of 4] : counting k-mers
counted k-mers for 550000 transcripts^[
Elapsed time: 3526.23s
Clipped poly-A tails from 2375 transcripts

[Step 2 of 4] : marking k-mers
marked kmers for 550000 transcripts
Elapsed time: 1295.67s

[Step 3 of 4] : building k-mers equivalence classes
done! There were 5077370 classes
Elapsed time: 1351.53s

[Step 4 of 4] : finalizing index
finalized kmers for 550000 transcripts
Elapsed time: 4424.16s
Writing the index to test3/
transcriptIDs.size() = 1419746642
parsed 552702 transcripts
There were 1015977902 distinct k-mers (canonicalized)

which looks fine, and then alignments are generated and rapmap exists with no errors.

from salmon.

rob-p commented on June 10, 2024

That is . . . strange! Salmon literally uses the RapMap index (and the RapMap functions) directly to obtain the quasi-mappings. One thing I noticed is that you seem to be using pseudoindex which is our independent re-implementation of pseudo-alignment. However, Salmon (and Sailfish) use quasi-mapping (RapMap's quasiindex and quasimap commands, as we found this to be more accurate). I presume that if you used the quasi-mapping functionality, you might observe the bug. If you don't (i.e. if RapMap performs quasi-mapping properly), then this is a real thinker (and I'd be happy to take a look myself if you can share the file).

P.S. The same caveat I mentioned above may apply. That is, it is possible that a polyA transcript that is completely removed from the input could cause a problem unless we check for it in the quasi-index, but may not affect the pseudo-index. This is because the quasi-index relies on a packed representation of the transcriptome and an associated sparse bit-vector to perform the mapping, and it assumes that all of the transcripts will have a non-zero length (if this is the culprit, it is, of course, easy to fix with an explicit check). You could also test this hypothesis by generating the quasi-index with the --noClip option, which will disable poly-A clipping when building the index.

from salmon.

mdshw5 commented on June 10, 2024

Sorry for the delay in responding to this, and thanks for your support on debugging potential issues. I ran RapMap (nice tool) successfully and then realized that I was just encountering the linux OOM killer... So I'm closing this issue as there really was no issue. Thanks again.

from salmon.

rob-p commented on June 10, 2024

No problem! We're actually working now on an optional use of a perfect hash in the quasi-index. It increases index construction times, but provides the same speed of lookup as the current hash. Also, it reduces the memory usage by a factor of ~2. We just have to figure out how to implement this cleanly in the code base.

from salmon.

mdshw5 commented on June 10, 2024

BIGGG +1 for this
cc @jmerkin

from salmon.

mdshw5 commented on June 10, 2024

If you need testers for this I'm glad to help.

from salmon.

rob-p commented on June 10, 2024

It's currently being developed here — https://github.com/COMBINE-lab/RapMap/tree/quasi-mph. Once we're convinced RapMap still behaves correctly when using the perfect hash index, then I have some (not too much) work to do to propagate the necessary changes to Sailfish & Salmon. The option is currently functional. If you grab this branch and build a quasi index using the -p option, it will use the emphf library to build the hash rather than a google dense hash (with a concordant decrease in memory usage).

from salmon.

Silent failure while loading index about salmon HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent