Giter VIP home page Giter VIP logo

Comments (2)

clemaitre avatar clemaitre commented on July 18, 2024

Hi,

Thank you for reporting this issue.
We were not aware that the suffix number could be any integer, and this case has obviously not been anticipated in the code of LRez : -1 suffixes are just removed from the barcode tag (and any other integer is not recognized).

In the case of different suffix numbers in a single BAM, the expected behaviour of LRez would be to consider as two distinct barcodes two barcodes that share the same nucleotide barcode sequence but have different suffix numbers, is that correct ?

This may not be straightforward to implement in LRez, since LRez assumes all barcodes are purely nucleotide sequences and then encodes them into integers with a 2bit encoding. The suffix numbers could be converted to nucleotide words appended to the barcodes, but this would cost extra space for vast majority of the datasets with only the "-1" suffix, and to optimize the extra space, we would need to know in advance the maximal number of different integer suffixes for the given sample.

Do you have an idea of this maximal number of different integer suffixes in practice ? In your opinion, does this situation (BAMs with multiple 10X libraries) occur frequently in practice ?

Note that a temporary (though not very neat or practical) solution is be to pre-process the BAM by replacing -X suffixes by short nucleotide words specific to each library.

Best,
Claire

from lrez.

pontushojer avatar pontushojer commented on July 18, 2024

Thanks for the quick reply!

In the case of different suffix numbers in a single BAM, the expected behaviour of LRez would be to consider as two distinct barcodes two barcodes that share the same nucleotide barcode sequence but have different suffix numbers, is that correct ?

Yes this is correct as the same nucleotide barcode sequence could have been sampled in multiple library preparations.

This may not be straightforward to implement in LRez, since LRez assumes all barcodes are purely nucleotide sequences and then encodes them into integers with a 2bit encoding. The suffix numbers could be converted to nucleotide words appended to the barcodes, but this would cost extra space for vast majority of the datasets with only the "-1" suffix, and to optimize the extra space, we would need to know in advance the maximal number of different integer suffixes for the given sample.

Do you have an idea of this maximal number of different integer suffixes in practice ? In your opinion, does this situation (BAMs with multiple 10X libraries) occur frequently in practice ?

I expected this might be an issue. As for the maximal number of expected suffixes I don't have a good answer here. Clearly most people only use BAMs with one suffix ("-1"). For me I have merged as much as 6 different libraries, that is 6 different suffixes in one BAM. I am however not sure how common this is for other people. A pretty safe estimate for a maximum numer of integer suffixes would probably be around 10.

Note that a temporary (though not very neat or practical) solution is be to pre-process the BAM by replacing -X suffixes by short nucleotide words specific to each library.

Yes I suppose this would be a solution.

An even simpler solution, and more practical for now, would be to just ignore any barcode suffix for the index. I am thinking that this would be a ok solution for now as I am not sure this is a big problem for other users. Also one could always confirm which suffix is present on an alignment after accessing it.

from lrez.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.