Comments (13)
Hi,
I will look into that and get back with a solution. Can you confirm its a single end read BAM file?
from mixcr.
It was confirmed to be pair end read BAM file!
Thank you!
from mixcr.
Does the same error occur if you run MiXCR on the raw data that hasn't been preprocessed with samtools?
from mixcr.
I have ran MiXCR successfully on the raw bam data (50Gb+ size), but it cost as long as 10 hours+, I just want to improve MiXCR speed to preprocess with samtools to select VDJ regions with target bed.
Do you have any other suggestions for improving MiXCR speed with large raw bam data (50Gb+ size)?
I have used --threads 8, but it could not meet my expectations.
Thanks a lot!
from mixcr.
I've been able to replicate the error you encountered. The issue arises from the fact that the bam file often contains a mix of paired-end and single-end alignments, e.g. if only one read of the pair is aligned. We're actively working on a solution for this and I will get back to you with a fix asap.
Separately, may I ask how many CPUs you're currently using? Given that 50+ GB raw files are substantial, the most resource-intensive step in exome-seq
is alignment, which depends heavily on the CPU. If you're working within a cluster environment, increasing the number of CPUs allocated for MiXCR should enhance the processing speed.
from mixcr.
Oh, I'm very happy to await your fix asap.
To enhance the MiXCR speed, I have tried:
--threads 8, virtual_free=32g, num_proc=16 (which are related to the number of CPUs )
With above parameters, some samples ran MiXCR successfully, other samples (even having the same bam data size with the success samples ) failed
the failed run with the below error:
java.lang.IllegalStateException: Unexpected buffer behaviour. This might be a sign that you ran out of storage in the output or tmp folder.
at com.milaboratory.o.zR.completed(SourceFile:1582)
at com.milaboratory.o.Bk$a.completed(SourceFile:1163)
at java.base/sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:127)
at java.base/sun.nio.ch.SimpleAsynchronousFileChannelImpl$3.run(SimpleAsynchronousFileChannelImpl.java:389)
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Also, the speed was not to be a significant improvement
Any other suggestions to set parameters --threads, virtual_free, num_proc for improving MiXCR speed with large raw bam data (50Gb+ size)? I'm not particularly skilled at setting --threads, virtual_free, num_proc.
Looking forward to your response, thanks!
from mixcr.
I have another question about your careful reply:
I've been able to replicate the error you encountered. The issue arises from the fact that the bam file often contains a mix of paired-end and single-end alignments, e.g. if only one read of the pair is aligned. We're actively working on a solution for this and I will get back to you with a fix asap.
According to my understanding, the raw bam data that hasn't been preprocessed with samtools and the bam data preprocessed with samtools would both meet the
the bam file often contains a mix of paired-end and single-end alignments, e.g. if only one read of the pair is aligned.
But why I have ran MiXCR successfully on the raw bam data, and failed on the bam data preprocessed with samtools?
Thanks a lot!
from mixcr.
Hi,
What cluster architecture do you use to run MiXCR? Could you please share the exact script you're using, including all the parameters (like virtual_free
, num_proc
, etc.)? The error you're experiencing seems to stem from a storage space limitation. By using the --use-local-temp
parameter, MiXCR will utilize the local storage space where the program is running, rather than defaulting to the /tmp
directory, which often resolves this issue.
Regarding your other question, my hypothesis is that with the full genome, all reads align somewhere. They might not necessarily align in a highly targeted manner and might instead be part of lower-score alignments. However, when you filter data by chromosomes, some paired reads might be discarded because their corresponding reads aligned in a different genomic location.
Additionally, the fix is now available for you to test. Please download the development version. Let me know how it works for you.
from mixcr.
Hi,
I greatly appreciate your prompt response!
I have tested the development version, it performed well in my limited samples and the processing speed also satisfied my needs.
Separately, may I ask what you have done with the mix of paired-end and single-end alignments, just discard the single-end alignments (like the samtools view -f 0x2, as flags described here: http://www.htslib.org/doc/samtools.html)?
Thanks a lot!
from mixcr.
No, no reads are being discarded. If a record has one read it will still be used for alignment.
from mixcr.
If a record has one read it will still be used for alignment.
Will single-end alignments affect the accuracy of the results?
There's been one lingering question that I haven't been able to fully grasp, both the raw bam data and the preprocessed bam with samtools were a mix of paired-end and single-end alignments bam, why I just ran MIXCR successfully on the raw bam data ?
from mixcr.
Single-end alignments won't affect accuracy. If a read is aligned to the TCR/BCR reference, it will be used for analysis; if not, it will be discarded.
It's hard to tell without seeing the actual file. Do you mean that MiXCR (not the development version) successfully analyzed the full dataset? Are you certain that single-end alignments were present?
from mixcr.
I will close the issue for now. Feel free to re-open.
from mixcr.
Related Issues (20)
- Analysis of single chain Fragment variable (scFv) HOT 1
- Error exportPlots shmTrees - Duplicate library: repseqio.v4.0_with_found_alleles:10090 HOT 2
- transitioning from postanalysis to exportPlots HOT 1
- Somatic Hypermutation Status of IGHV status HOT 2
- Question about filtering reads prior to Mixcr HOT 1
- Preset for long-read RNAseq with cell barcode HOT 1
- How to perform MiXCR on my spatial TCR data HOT 1
- The problem of mixcr exportPlots shmTrees HOT 3
- The majority of results in 'clones_TR[A|B].tsv' are "region_not_covered" HOT 1
- allVHitsWithScore allDHitsWithScore allJHitsWithScore allCHitsWithScore HOT 1
- FASTQ and BAM give discordant results HOT 6
- Help with TCR-seq alignment rate
- findAlleles get empty clone data HOT 11
- Rhapsody BCR+TCR full length data input HOT 2
- Generating all all_contig_annotations.json to run enclone after MiXCR
- Postanalysis output explanation HOT 3
- `Feature for allele search doesn't intersect JGene` error when running `findAlleles` HOT 1
- Runtime error of "Can't apply step BuildingInitialTrees" when running findShmTrees HOT 3
- Question: Does mixcr take sequence stagger into consideration? HOT 1
- ERROR: picocli.CommandLine$ExecutionException: Error while running command align java.lang.IllegalStateException: Check failed. HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mixcr.