Giter VIP home page Giter VIP logo

Comments (13)

mizraelson avatar mizraelson commented on July 18, 2024

Hi,
I will look into that and get back with a solution. Can you confirm its a single end read BAM file?

from mixcr.

xmy1990 avatar xmy1990 commented on July 18, 2024

It was confirmed to be pair end read BAM file!
Thank you!

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

Does the same error occur if you run MiXCR on the raw data that hasn't been preprocessed with samtools?

from mixcr.

xmy1990 avatar xmy1990 commented on July 18, 2024

I have ran MiXCR successfully on the raw bam data (50Gb+ size), but it cost as long as 10 hours+, I just want to improve MiXCR speed to preprocess with samtools to select VDJ regions with target bed.

Do you have any other suggestions for improving MiXCR speed with large raw bam data (50Gb+ size)?
I have used --threads 8, but it could not meet my expectations.
Thanks a lot!

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

I've been able to replicate the error you encountered. The issue arises from the fact that the bam file often contains a mix of paired-end and single-end alignments, e.g. if only one read of the pair is aligned. We're actively working on a solution for this and I will get back to you with a fix asap.

Separately, may I ask how many CPUs you're currently using? Given that 50+ GB raw files are substantial, the most resource-intensive step in exome-seq is alignment, which depends heavily on the CPU. If you're working within a cluster environment, increasing the number of CPUs allocated for MiXCR should enhance the processing speed.

from mixcr.

xmy1990 avatar xmy1990 commented on July 18, 2024

Oh, I'm very happy to await your fix asap.

To enhance the MiXCR speed, I have tried:
--threads 8, virtual_free=32g, num_proc=16 (which are related to the number of CPUs )
With above parameters, some samples ran MiXCR successfully, other samples (even having the same bam data size with the success samples ) failed
the failed run with the below error:

java.lang.IllegalStateException: Unexpected buffer behaviour. This might be a sign that you ran out of storage in the output or tmp folder.
        at com.milaboratory.o.zR.completed(SourceFile:1582)
        at com.milaboratory.o.Bk$a.completed(SourceFile:1163)
        at java.base/sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:127)
        at java.base/sun.nio.ch.SimpleAsynchronousFileChannelImpl$3.run(SimpleAsynchronousFileChannelImpl.java:389)
        at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

Also, the speed was not to be a significant improvement
Any other suggestions to set parameters --threads, virtual_free, num_proc for improving MiXCR speed with large raw bam data (50Gb+ size)? I'm not particularly skilled at setting --threads, virtual_free, num_proc.

Looking forward to your response, thanks!

from mixcr.

xmy1990 avatar xmy1990 commented on July 18, 2024

I have another question about your careful reply:

I've been able to replicate the error you encountered. The issue arises from the fact that the bam file often contains a mix of paired-end and single-end alignments, e.g. if only one read of the pair is aligned. We're actively working on a solution for this and I will get back to you with a fix asap.

According to my understanding, the raw bam data that hasn't been preprocessed with samtools and the bam data preprocessed with samtools would both meet the

the bam file often contains a mix of paired-end and single-end alignments, e.g. if only one read of the pair is aligned.

But why I have ran MiXCR successfully on the raw bam data, and failed on the bam data preprocessed with samtools?

Thanks a lot!

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

Hi,

What cluster architecture do you use to run MiXCR? Could you please share the exact script you're using, including all the parameters (like virtual_free, num_proc, etc.)? The error you're experiencing seems to stem from a storage space limitation. By using the --use-local-temp parameter, MiXCR will utilize the local storage space where the program is running, rather than defaulting to the /tmp directory, which often resolves this issue.

Regarding your other question, my hypothesis is that with the full genome, all reads align somewhere. They might not necessarily align in a highly targeted manner and might instead be part of lower-score alignments. However, when you filter data by chromosomes, some paired reads might be discarded because their corresponding reads aligned in a different genomic location.

Additionally, the fix is now available for you to test. Please download the development version. Let me know how it works for you.

from mixcr.

xmy1990 avatar xmy1990 commented on July 18, 2024

Hi,
I greatly appreciate your prompt response!
I have tested the development version, it performed well in my limited samples and the processing speed also satisfied my needs.

Separately, may I ask what you have done with the mix of paired-end and single-end alignments, just discard the single-end alignments (like the samtools view -f 0x2, as flags described here: http://www.htslib.org/doc/samtools.html)?

Thanks a lot!

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

No, no reads are being discarded. If a record has one read it will still be used for alignment.

from mixcr.

xmy1990 avatar xmy1990 commented on July 18, 2024

If a record has one read it will still be used for alignment.
Will single-end alignments affect the accuracy of the results?

There's been one lingering question that I haven't been able to fully grasp, both the raw bam data and the preprocessed bam with samtools were a mix of paired-end and single-end alignments bam, why I just ran MIXCR successfully on the raw bam data ?

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

Single-end alignments won't affect accuracy. If a read is aligned to the TCR/BCR reference, it will be used for analysis; if not, it will be discarded.

It's hard to tell without seeing the actual file. Do you mean that MiXCR (not the development version) successfully analyzed the full dataset? Are you certain that single-end alignments were present?

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

I will close the issue for now. Feel free to re-open.

from mixcr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.