Hi, I first extracted partial reads from raw BAM based on VD

Error while running command align java.lang.IllegalArgumentException about mixcr HOT 13 CLOSED

xmy1990 commented on August 17, 2024

Error while running command align java.lang.IllegalArgumentException

from mixcr.

Comments (13)

mizraelson commented on August 17, 2024

Hi,
I will look into that and get back with a solution. Can you confirm its a single end read BAM file?

from mixcr.

xmy1990 commented on August 17, 2024

It was confirmed to be pair end read BAM file!
Thank you!

from mixcr.

mizraelson commented on August 17, 2024

Does the same error occur if you run MiXCR on the raw data that hasn't been preprocessed with samtools?

from mixcr.

xmy1990 commented on August 17, 2024

I have ran MiXCR successfully on the raw bam data (50Gb+ size), but it cost as long as 10 hours+, I just want to improve MiXCR speed to preprocess with samtools to select VDJ regions with target bed.

Do you have any other suggestions for improving MiXCR speed with large raw bam data (50Gb+ size)?
I have used --threads 8, but it could not meet my expectations.
Thanks a lot!

from mixcr.

mizraelson commented on August 17, 2024

I've been able to replicate the error you encountered. The issue arises from the fact that the bam file often contains a mix of paired-end and single-end alignments, e.g. if only one read of the pair is aligned. We're actively working on a solution for this and I will get back to you with a fix asap.

Separately, may I ask how many CPUs you're currently using? Given that 50+ GB raw files are substantial, the most resource-intensive step in exome-seq is alignment, which depends heavily on the CPU. If you're working within a cluster environment, increasing the number of CPUs allocated for MiXCR should enhance the processing speed.

from mixcr.

xmy1990 commented on August 17, 2024

Oh, I'm very happy to await your fix asap.

To enhance the MiXCR speed, I have tried:
--threads 8, virtual_free=32g, num_proc=16 (which are related to the number of CPUs )
With above parameters, some samples ran MiXCR successfully, other samples (even having the same bam data size with the success samples ) failed
the failed run with the below error:

java.lang.IllegalStateException: Unexpected buffer behaviour. This might be a sign that you ran out of storage in the output or tmp folder.
        at com.milaboratory.o.zR.completed(SourceFile:1582)
        at com.milaboratory.o.Bk$a.completed(SourceFile:1163)
        at java.base/sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:127)
        at java.base/sun.nio.ch.SimpleAsynchronousFileChannelImpl$3.run(SimpleAsynchronousFileChannelImpl.java:389)
        at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

Also, the speed was not to be a significant improvement
Any other suggestions to set parameters --threads, virtual_free, num_proc for improving MiXCR speed with large raw bam data (50Gb+ size)? I'm not particularly skilled at setting --threads, virtual_free, num_proc.

Looking forward to your response, thanks!

from mixcr.

xmy1990 commented on August 17, 2024

I have another question about your careful reply:

I've been able to replicate the error you encountered. The issue arises from the fact that the bam file often contains a mix of paired-end and single-end alignments, e.g. if only one read of the pair is aligned. We're actively working on a solution for this and I will get back to you with a fix asap.

According to my understanding, the raw bam data that hasn't been preprocessed with samtools and the bam data preprocessed with samtools would both meet the

the bam file often contains a mix of paired-end and single-end alignments, e.g. if only one read of the pair is aligned.

But why I have ran MiXCR successfully on the raw bam data, and failed on the bam data preprocessed with samtools?

Thanks a lot!

from mixcr.

mizraelson commented on August 17, 2024

Hi,

What cluster architecture do you use to run MiXCR? Could you please share the exact script you're using, including all the parameters (like virtual_free, num_proc, etc.)? The error you're experiencing seems to stem from a storage space limitation. By using the --use-local-temp parameter, MiXCR will utilize the local storage space where the program is running, rather than defaulting to the /tmp directory, which often resolves this issue.

Regarding your other question, my hypothesis is that with the full genome, all reads align somewhere. They might not necessarily align in a highly targeted manner and might instead be part of lower-score alignments. However, when you filter data by chromosomes, some paired reads might be discarded because their corresponding reads aligned in a different genomic location.

Additionally, the fix is now available for you to test. Please download the development version. Let me know how it works for you.

from mixcr.

xmy1990 commented on August 17, 2024

Hi,
I greatly appreciate your prompt response！
I have tested the development version, it performed well in my limited samples and the processing speed also satisfied my needs.

Separately, may I ask what you have done with the mix of paired-end and single-end alignments, just discard the single-end alignments (like the samtools view -f 0x2, as flags described here: http://www.htslib.org/doc/samtools.html)?

Thanks a lot!

from mixcr.

mizraelson commented on August 17, 2024

No, no reads are being discarded. If a record has one read it will still be used for alignment.

from mixcr.

xmy1990 commented on August 17, 2024

If a record has one read it will still be used for alignment.
Will single-end alignments affect the accuracy of the results?

There's been one lingering question that I haven't been able to fully grasp, both the raw bam data and the preprocessed bam with samtools were a mix of paired-end and single-end alignments bam, why I just ran MIXCR successfully on the raw bam data ?

from mixcr.

mizraelson commented on August 17, 2024

Single-end alignments won't affect accuracy. If a read is aligned to the TCR/BCR reference, it will be used for analysis; if not, it will be discarded.

It's hard to tell without seeing the actual file. Do you mean that MiXCR (not the development version) successfully analyzed the full dataset? Are you certain that single-end alignments were present?

from mixcr.

mizraelson commented on August 17, 2024

I will close the issue for now. Feel free to re-open.

from mixcr.

Error while running command align java.lang.IllegalArgumentException about mixcr HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent